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Abstract:  In  tins  paper  w«,show>how  the  theory  of  dynamical  systems  can  be  employed 

to  solve  problems  in  motion  vision.  In  particular  we'd  ''velopralgorithms  for  recovery  of  dense 
depth  maps  and  motion  parameters  using  state  space  observers  or  filters.  Webegin^with  a 
review  of  previous,  related  work  followed  by  an  overview  of  relevant  aspects  of  the  theory 
of  dynamical  systems.  Three  dynamical  models  of  the  imaging  situation  are  presented: 

the  first  model  we,  assume; that  motion  is  known  and  a  reflectance  model  for  the 
surface  is  given.  Depth  is  recovered  directly  from  brightness  measurements  with  a 
nonlinear  Kalman  filter. 

The  second  model  assumes  that  a  planar  surface  is  being  viewed.  Motion  and  depth 
are  recovered  with  a  nonlinear  observer. 

•  The  third  model  makes  no  explicit  assumptions  about  motion  or  surface.  It  is  embed¬ 
ded  in  a  system  that  iteratively  improves  an  estimate  for  both  the  motion  parameters 
and  a  dense  depth  map  using  Kalman  filters. 

No  feature-matching  preprocessor  is  required.  Solutions  to  other  motion  vision  problems 
(tracking,  camera  parameter  estimation)  using  dynamical  systems  theory  are  suggested. 
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1  Introduction 

Research  in  motion  vision  has  previously  focussed  on  the  interpretation  of  two  successive 
image  frames  in  which  either  the  camera  or  parts  of  the  environment  are  in  motion.  Conse¬ 
quently,  the  results  of  these  methods  are  instantaneous  and  make  no  use  of  the  redundancy 
inherent  in  a  series  of  frames.  Examples  of  such  instantaneous  analyses  include:  estima¬ 
tion  of  the  optical  flow  (Hildreth  [19],  [20],  Horn  and  Schunck  [27],  Nagel  and  Enkelmann 
[40],  [39],  [41],  Anandan  [2],  Fennema  and  Thompson  [12]),  recovery  of  motion  parameters 
(Longuet-Higgins  and  Prazdny  [35],  Negahdaripour,  Weldon  and  Horn  [42],  [28],  Weng, 
Huang  and  Ahuja  [64],  Fennema  and  Thompson  [12],  Tsai  and  Huang  [55],  [56],  Waxman 
and  Wohn  [63],  Roach  and  Aggarwal  [47],  Prazdny  [45],  Waxman  and  Ullman  [62])  and 
depth  measurement  (Kanatani  [31],  Prazdny  [45],  Waxman,  Sinha  and  Ullman  [62],  [61], 
and  Mitiche  [38]). 

Recently,  some  proposals  have  been  made  for  making  use  of  more  than  two  frames  out 
of  an  image  sequence.  Additional  frames  contain  redundant  information  that  may  provide 
additional  constraints  on  an  otherwise  ill-posed  problem. 

Several  categories  of  multiframe  motion  algorithms  have  evolved:  A  first  distinction  can 
be  made  with  respect  to  the  format  of  the  input  data. 

•  Feature-based  methods.  These  methods  assume  that  the  localization  and  correspon¬ 
dence  of  features  has  been  established  beforehand  by  some  other  module.  Since 
features  are  usually  sparse  depth  cannot  be  recovered  densely. 

•  Non- feature-based  methods.  In  most,  of  these  algorithms  a  brightness  constancy  as¬ 
sumption  similar  to  the  one  first  derived  by  Horn  and  Schunck  [27]  is  made  to  deter¬ 
mine  how  the  projections  of  real  world  points  move  in  the  image  plane. 

The  second  distinction  deals  with  the  input  data  organization.  The  approaches  taken 
here  are 

•  Recursive  algorithms.  These  methods  commonly  maintain  some  current  estimate, 
accept  one  frame  of  input  data  at  a  time  and  continuously  update  the  estimate. 

•  Global  (batch)  algorithms.  Here,  all  of  the  input  data  is  accepted  at  once  and  some 
global  criterion  is  optimized  as  opposed  to  the  local  improvement  achieved  by  the 
recursive  schemes.  The  main  disadvantage,  however,  is  the  necessity  of  storing  large 
amounts  of  data  and  not  being  suitable  for  “on-line”  applications. 

One  example  of  a  structure-from-motion  algorithm  that  exploits  information  in  a  se¬ 
quence  of  frames  is  Ullman’s  incremental  rigidity  scheme  [58].  In  this  scheme  a  set  of 
feature  points  is  tracked  through  a  sequence  of  images.  Beginning  with  an  initially  flat 
model  the  algorithm  incorporates  the  incoming  information  about  the  displacement  of  the 
features  and  updates  the  model  while  maximizing  its  rigidity.  The  incremental  rigidity- 
scheme  is  a  recursive  feature-based  method  and  therefore  suitable  for  on-line  recovery  of 
depth  information  in  sparse  locations. 

Another  feature-based  algorithm  is  the  one  presented  l>v  Sethi  and  Jain  [49].  They 
assume  that  a  number  of  features  has  been  localized  in  each  frame  of  an  image  sequence'. 
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They  then  show  how  the  smoothness  of  motion  can  he  employed  to  find  correspondences 
among  these  features  i.e.  they  propose  a  solution  to  1  lie  correspondence  problem  over  a  set  of 
motion  frames.  Using  a  greedy  optimization  algorithm,  they  address  the  following  problem: 
given  m  points  in  each  of  n  frames,  determine  m  trajectories.  The  solution  is  found  by 
minimizing  a  functional  capturing  a  general  smoothness  constraint  for  the  trajectory  path. 

Tsai  [54]  presents  a  surface  reconstruction  procedure  for  known  viewing  orientations 
extended  to  n  viewing  positions.  He  shows  that  the  multiframe  method  provides  higher 
accuracy  while  only  slightly  increasing  the  computational  complexity.  This  will  be  the  deci¬ 
sive  efficiency  measure  for  multiframe  algorithms:  can  the  accuracy  be  increased  sufficiently 
with  respect  to  the  additional  computational  cost. 

Tracking  is  another  application  of  multiframe  motion  vision.  Usually  the  idea  is  to 
control  the  motion  of  a  camera  viewing  a  certain  moving  object  such  that  the  object  image 
remains  nearly  fixed  in  the  image  plane.  This  application  is  mostly  feature-based  and  an 
example  of  a  purely  recursive  scheme.  There  are  numerous  publications  on  this  subject. 
Extensive  reference  lists  are  provided  by  [10]  and  [3]. 

Shariat  and  Price  [50]  use  point  correspondences  in  more  fhan  two  frames  to  estimate  the 
motion  of  a  moving  object.  They  exploit  the  fact  that  after  compensating  for  translation, 
points  on  the  observed  object  will  move  in  circles  in  space  according  to  the  rotational 
velocity.  The  motion  estimation  problem  can  then  be  reduced  to  the  problem  of  computing 
circle  parameters  from  point  correspondences.  This  approach  can  be  classified  as  a  feature- 
based  global  solution  since  measurements  from  all  frames  must  be  available  before  the 
parameters  can  be  computed. 

Another  recursive  depth  estimation  procedure  for  more  than  two  frames  was  suggested 
by  Bharwani,  Riseman  and  Hanson  [5],  A  correlation  based  method  similar  to  the  one 
employed  in  section  6  is  used  to  estimate  displacements  of  features  between  successive 
frames.  Under  the  assumption  of  known  translational  motion  the  depth  is  then  easily 
recovered.  This  recursive  approach  is  not  really  restricted  to  identifiable  features  and  pure 
translation  but  it  becomes  much  easier  to  handle  mathematically.  An  important  feature  of 
the  algorithm  is  that  it  exploits  the  fact  that  motion  is  known  to  predict  the  displacement  in 
the  next  frame  and  thereby  limit  the  search  in  the  correlation- based  estimator.  However,  a 
more  physically  motivated  strategy  for  predicting  and  updating  the  estimates  that  involves 
the  minimization  of  measurement  errors  seems  necessary.  In  addition,  we  would  like  to 
avoid  restricting  the  motion  to  known  translation. 

The  above  methods  employ  different  approaches  for  dealing  with  the  data  sampled  over 
time,  which  are  largely  motivated  by  physically  intuitive  constraints  imposed  upon  the 
problem  (for  instance  rigidity  of  the  object,  smoothness  of  the  trajectories).  Researchers 
have  been  looking  for  a  more  systematic  way  in  which  to  incorporate  the  temporal  relation¬ 
ship  among  the  successive  measurements.  Recently  problems  in  motion  vision  have  been 
formulated  in  a  way  that  deals  explicitly  with  the  time-dependency  of  the  observed  data. 
Such  a  formulation  is  provided  by  the  theory  of  dynamical  systems. 

One  of  the  earliest  applications  of  dynamical  systems  methodology  may  be  found  in 
Stuller  and  Krishnamurthy  [52].  As  we  will  see.  a  Kalman  filter  is  a  means  of  estimating 
states  of  a  dynamical  system  when  only  a  function  of  these  states  is  measurable  as  output. 
Stuller  and  Krishnamurthy  formulate  the  projected  translational  motion  of  an  object  in 
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terms  of  dynamical  systems  equations  in  which  the  interframe  displacements  appear  as 
states  so  that  a  Kalman  filter  can  be  employed  to  estimate  them.  The  authors  present 
experimental  applications  of  their  method.  Apart  from  the  restriction  to  translational 
motion  the  approach  has  a  severe  flaw:  the  underlying  dynamical  equations  constitute  a 
merely  heuristic  association  of  the  variables  needed  for  the  estimation.  When  mathematical 
models  are  not  based  on  laws  of  physics  their  relationship  to  reality  becomes  unclear. 

Besides  investigating  the  relationship  between  the  motion-field  and  the  optical  flow, 
Verri  and  Poggio  [60]  noted  that  the  motion  field,  the  projection  of  the  3D  velocity  field 
into  the  image  plane,  is  a  vector  field  tangential  to  the  trajectories  of  a  dynamical  system 
and  that  the  qualitative  properties  of  this  dynamical  field  are  reflected  in  the  properties  of 
the  optical  flow  field  that  is  commonly  recovered  from  brightness  variations.  They  give  an 
overview  of  the  theory  of  dynamical  systems  from  the  theoretical  perspective  of  [21]. 

Verri,  Girosi  and  Torre  [59]  establish  a  formal  correspondence  between  the  theory  of 
dynamical  systems  and  motion  vision.  Their  basic  observation  is  that  the  motion  field 
equations  form  a  two-dimensional  dynamical  system.  Properties  of  this  dynamical  system 
can  be  inferred  from  its  behavior  near  singular  points  that  may  appear  in  images  as  a 
focus  of  expansion,  for  instance.  They  show  how  the  type  of  motion  as  well  as  quantitative 
measurements  of  rotation  and  translation  may  be  determined  given  the  optical  flow  near 
the  singular  points. 

Direct  applications  of  the  theory  of  dynamical  systems  to  the  estimation  of  object 
kinematics  and  structure  in  motion  vision  have  been  discussed  repeatedly  by  Broida  and 
Chellappa.  In  [6]  the  authors  present  a  dynamical  system  that  describes  the  temporal 
variation  of  3D  coordinates  and  velocity.  They  further  present  a  model  of  the  measurement 
process  corrupted  by  noise.  Given  such  a  formulation,  the  technique  of  optimal  filtering  is 
applied  to  estimate  depth  and  velocity.  The  scheme  suffers  from  an  extremely  simplified 
description  of  object  motion  (only  two  real-world  points  with  known  position  relative  to  the 
center  of  mass  are  described  by  the  model),  the  motion  is  planar  (translation  and  rotation 
in  a  plane  only)  and  the  measurement  equations  assume  known  interframe  correspondences 
(no  relationship  to  the  measurable  brightness  values  is  established). 

In  [7]  the  model  is  generalized  to  n  feature  points  but  a  known  matching  between  the 
frames  is  still  assumed.  Again  the  iterated  extended  Kalman  filter  is  employed  to  perform 
the  recursive  estimation.  Unit  quaternions  are  used  to  represent  interframe  rotation.  No 
application  of  the  technique  to  real  data  is  mentioned,  presumably  because  the  preconditions 
are  yet  too  restrictive.  The  authors  have  also  proposed  a  corresponding  global  solution  [8] 
which  again  shows  the  duality  of  the  two  approaches. 

Interesting  practical  results  are  presented  by  Dickmans  [11].  The  objective  in  this  paper 
is  to  recover  real-world  coordinates  and  motion  parameters  from  the  measured  image  point 
coordinates  of  distinguished  feature  points.  This  requires  assumptions  about  the  underlying 
dynamics  of  the  object  being  viewed  to  obtain  a  complete  dynamical  systems  description, 
which  is  then  used  in  a  Kalman  filter  estimation  procedure.  Although  the  paper  does  not 
present  the  exact  form  of  the  models  employed,  extensive  experiments  involving  docking 
procedures  for  a  vehicle,  landing  of  aircraft  and  guidance  of  motor  vehicles  are  reported. 

Recently  Matthies,  Szeliski  and  Kanade  [37]  have  presented  ideas  very  similar  to  our 
own.  They  propose  a  non-feature- based  recursive  dense  depth  estimation  based  on  Kalman 
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fil Coring.  Motion  is  assumed  to  he  known,  purely  translational  and  parallel  to  the  image 
plane.  A  correlation  estimator  is  used  to  compute  displacements  of  image  points  between 
successive  frames.  If  motion  is  known,  depth  can  be  recovered  in  a  straightforward  manner. 
The  key  idea  is  to  use  the  Kalman  filter  to  update  a  dense  depth  map  given  these  correlation- 
based  measurements  for  every  new  frame.  Although  severely  restricted  by  the  necessity  to 
know  motion,  encouraging  experimental  results  are  presented. 

The  main  intention  of  this  paper  is  to  introduce  a  rigorous  dynamical  systems  framework 
for  the  motion  vision  pre^ss.  This  begins  by  realizing  that  there  is  no  one  dynamical  system 
that  describes  the  behavior  of  successive  frame  measurements  “correctly”.  The  structure 
of  the  system  depends  crucially  on  the  assumptions  made  about  the  imaging  situation, 
which  are  mainly  reflected  in  the  output  equations.  We  describe  three  different  dynamical 
models  of  motion  vision  appropriate  for  various  imaging  situations.  As  an  example  of  how 
techniques  used  in  the  control  of  dynamical  systems  may  be  employed  in  motion  vision  we 
develop  recursive  algorithms  which  do  not  require  the  existence  of  feature-correspondence 
for  the  estimation  of  dense  depth  maps  and  motion  parameters  using  Kalman  filters.  We 
stress  the  importance  of  a  rigorous  physical  motivation  of  the  model  of  the  imaging  process, 
which  is  a  major  flaw  of  many  previous  approaches. 

We  proceed  as  follows.  The  subsequent  section  introduces  the  terminology  and  notation 
of  dynamical  systems.  We  then  describe  our  choice  of  coordinate  systems  and  derive  the 
basic  motion  equations  for  both  the  continuous  and  the  discrete  case.  Then  three  dynamical 
models  of  the  imaging  situation  are  presented  and  corresponding  estimators  for  depth  and 
motion  are  derived. 

2  Dynamical  Systems 

This  section  contains  an  overview  of  the  relevant  material  from  the  theory  of  dynamical 
systems.  The  reader  familiar  with  the  topic  may  wish  to  skip  this  section.  Literature  in 
this  area  is  readily  available.  [36],  [14]  and  [30]  provide  introduction  to  the  general  theory 
dealing  mainly  with  continuous,  linear  systems.  Discrete  systems  -  of  particular  importance 
in  our  case  of  a  sampling  camera  system  -  arc  discussed  in  [32],  [44],  [65]  and  [13].  Special 
topics  such  as  identification  may  be  found  in  [43]  and  [34];  for  filtering  a  widely  used 
reference  (cf.  [52],  [6],  [7])  is  the  one  by  Gelb  [i6j. 

2.1  Basic  Concepts 

We  present  here  the  state  space  model  of  dynamical  systems.  In  the  most  general  case 
such  a  system  is  described  by  a  first-order  vector  differential  equation 


and  an  algebraic  vector  equation 


x(<)  =  f(x(t),u(t)) 


y(t)  =  g(x(t),u(t)). 


Equation  (1)  is  referred  to  as  the  plant  equation  that  describes  the  dynamic  behavior  of  the 
system.  It  states  that  the  temporal  variation  of  the  state  vector  x(  t)  depends  (possibly  in  a 
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nonlinear  fashion)  on  x(t)  itself  and  an  input  vector  u(t).  The  output  equation  (2)  models 
the  measurement  process  in  which  the  output  vector  y(t)  is  obtained  as  a  function  of  the 
current  state  and  the  input. 

Linear  systems  are  a  subset  of  the  above,  the  model  being 

x(<)  =  Ax(i)  +  Bu(£)  (3) 

y(<)  =  Cx(£)  +  Du(().  (4) 

where  A,  B,  C  and  D  are  matrices.  Most  available  methods  operate  on  linear  systems  as 
we  will  see.  We  often  visualize  dynamical  systems  using  block  diagrams  as  in  figure  ( 1). 


Figure  1:  Block  diagram  representation  of  dynamical  systems 


The  major  task  in  the  application  of  dynamical  systems  theory  is  constructing  an  ap¬ 
propriate  model.  We  thereby  encounter  a  tradeoff  between  the  model’s  complexity  and  the 
accuracy  with  which  it  represents  the  actual  dynamics. 

2.2  An  Example 

Consider  a  mass  m  attached  to  a  spring  of  stiffness  k.  The  attachment  of  the  spring 
may  be  moved  by  external  influence  by  an  amount  u(t).  Finally  we  let  x(t )  denote  distance 
between  the  mass  and  the  attachment.  The  arrangement  is  depicted  in  figure  (2). 
Summing  up  the  forces  on  mass  m  provides  the  equation 

m'i  =  mg  -  k(x  -  u).  (5) 
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By  introducing  x  —  x  -  mg/k,  which  simply  describes  the  distance  from  the  equilibrium 
xQ  —  mg/k ,  we  obtain 

k  k  ... 

x  - - x  -)-  — u.  (b) 

m  m 


SL 

•Jfl 

5  ‘3 
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Figure  2:  A  simple  dynamical  system 


Now  we  use  the  dummy  variable  v  —  x  (simply  the  velocity)  to  convert  the  second-order 
ODE  into  two  first  order  ODE’s  which  are  the  desired  plant  equations: 


x  = 


Xl 

X  } 
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l  ’ 

x  + 

’  0 
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L j 

m 
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We  see  that  the  plant  equations  are  linear  corresponding  to  (3).  We  can  measure  the 
behavior  of  this  system  by  the  position  of  the  mass,  which  is  simply  the  first  component  of 
the  state  vector.  So  the  output  equation  is 


y  =  x  =  [l,0]x. 


(8) 


2.3  Discrete  Systems 


For  many  systems,  output  and  state  are  only  defined  at  discrete  points  in  time.  We  may 
also  think  of  such  systems  as  the  result  of  a  continuous  system  being  sampled  periodically. 
Assuming  system  (3),  (4)  being  subject  to  sample  and  hold  of  period  T,  we  can  write 


Xfc+1  =  $xk  +  r\ik 

(9) 

yk  =  Cxi  +  Dut 

where  xk  is  a  shorthand  for  x(Tk),  fc  =  0. 1, . . .  and 

*  -  *AT  =  f:^r 

k-0  h  * 

(10) 

(ID 

r  =  (  jr cAt(It)b. 

Jo 

(12) 

<) 


This  description  is  suitable  for  dynamical  systems  involving  digital  computers,  especially 
for  processing  of  image  frames  taken  at  discrete  points  of  time. 

2.4  Linearization 

Unfortunately,  many  realistic  problems  do  not  lead  to  such  simple  linear  differential 
equations  but  appear  in  the  more  general  form  of  (1),  (2).  Very  few  methods  for  the  tasks 
discussed  in  the  following  sections  are  available,  which  operate  directly  on  the  nonlinear 
description.  A  remedy  is  to  linearize  the  system  by  using  a  Taylor  series  expansion  about 
some  fixed  point  Xo,  uo: 

f(x,  u)  =  f(xo,  u0)  +  (x  Xq)  +  (u-u0)+--  (13) 

Xo  OU  U0 

If  we  assume  that  state  and  input  vary  only  slightly  around  the  point  about  which  f  has 
been  expanded,  we  may  neglect  higher-order  terms  and  write 

Ax  =  AAx-f-BAu  (14) 

where  we  introduced  Ax  =  x  —  xo,  Au  =  u  —  u0  and 

A  =  and  (15) 

ox  xo  on  Uo 

arc  Jacobian?  rr  r  A  similar  argument  leads  to  a  linearized  output  equation  and  we  may 
apply  linear  techniques  to  the  transformed  system. 

2.5  Stability 

We  differentiate  two  definitions  of  stability  (see  [15],  [30],  [36]): 

External  (BIBO)  stability:  A  dynamical  system  as  in  (1),  (2)  is  said  to  be  externally  stable 
if  a  bounded  input  function  u(t)  results  in  a  bounded  output  function  y(t). 

Internal  ( asymptotic )  stability:  A  linear  dynamical  system  as  in  (3),  (4)  is  said  to  be 
internally  stable  if  the  solution  of 

x(t )  =  Ax(t)  (16) 

tends  toward  zero  as  t  — >  oo  for  arbitrary  initial  condition  x(<q)- 

A  necessary  and  sufficient  condition  for  internal  stability  (also  called  asymptotic  stability) 
is  that  all  eigenvalues  of  A  have  negative  real  parts.  An  internally  stable  system  is  also 
externally  stable.  Lyapunov  and  Poincare  proved  that  a  nonlinear  system  is  stable  with 
respect  to  small  perturbations  from  an  equilibrium  Xo  if  the  system  matrix  of  the  system 
linearized  about  Xo  is  stable. 

2.6  Control 

In  the  broadest  sense,  control  is  the  augmentation  of  a  given  dynamical  system  bv  ad¬ 
ditional  systems  in  order  to  arhipve  a  desboH  overall  behavior.  The  most  '•ommon  control 


Plant 


Figure  3:  Feedback  control 


structure  is  feedback  where  the  state  x  is  transformed  by  a  compensator  matrix  R,  com¬ 
pared  with  a  commanded  input  vector  uc  and  the  difference  used  as  input  u  to  the  plant 
as  in  figure  (3). 

We  then  have  that 

u  =  uc  -  Rx  (17) 

which  gives  us  the  overall  plant  equation  of 

x=  (  A  -  BR)x  +  Ruc  (18) 

To  influence  the  overall  behavior  of  this  system  we  may  choose  the  matrix  R.  The  properties 
of  a  dynamic  system  are  determined  by  the  position  of  the  eigenvalues  of  the  system  matrix 
(cf.  stability).  A  common  criterion  for  the  choice  of  the  elements  of  R  is  therefore  to  achieve 
desired  locations  of  the  poles  of  A  —  BR.  This  is  called  pole  placement. 

2.7  Observers  and  Filters 

We  are  often  interested  in  knowing  the  state  x(t)  of  our  dynamical  system  but  can  only 
measure  the  output  y(t),  for  instance  in  the  case  of  feedback  control.  An  observer  is  a 
dynamical  system  that  reconstructs  the  state  x(t)  given  the  output  y(t)  and  the  input  u(t) 
when  the  dynamics  of  plant  and  output  are  known  (figure  (d)). 
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Plant 


Measurement 


X  = 


x=o(x,u,y) 


Observer 


Figure  4:  The  observer  principle 


In  the  linear  case  (3),  (4)  we  use  the  following  observer  structure 

x  =  Ax  +  Bu  +  E(y  -  Cx  -  Du)  (19) 

where  x  denotes  our  estimate  for  the  state  vector  x  and  E  is  a  matrix  that  we  may  choose 
freely.  We  see  that  this  is  a  copy  of  the  original  plant  equation  augmented  by  a  term  for 
the  error  between  estimated  and  actual  measurement  value.  A  criterion  for  the  choice  of  E 
may  be  derived  from  the  error  e  =  x  —  x.  From  (19)  and  substituting  (4)  for  y  we  have 

c  =  (A-EC)e  (20) 

which  means  that  the  dynamics  of  the  error  depend  or.  the  eigenvalues  of  A  -  EC.  We 
may  therefore  influence  the  dynamics  of  this  error  using  pole  placement  techniques  on  this 
matrix.  We  will  at  least  require  this  matrix  to  be  asymptotically  stable  in  order  for  the  error 
to  disappear  eventually.  For  discrete  systems  the  error  can  vanish  after  a  finite  number  of 
sampling  periods  (dead  beat  response).  This  can  be  achieved  by  placing  all  poles  of  the 
discrete  observer  at  the  origin. 

A  Kalman  filter  is  derived  from  an  observer  of  a  linear  system  as  in  (3).  (1)  where 
state  and  output  are  corrupted  by  Gaussian  noise  of  a  known  distribution.  State,  input 
and  output  are  interpreted  as  random  processes  with  a  known  distribution  of  the  initial 
state  vector  x(0).  The  filter  then  consists  of  two  stages  for  every  iteration:  a  prediction 
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|>Ii;i.si>  ill  which  the  next  slate  vector  is  computed  |>y  using  the  (  urrcnl.  stale  eslimaie  in 
the  state  equations  and  an  update  phase  in  which  the  current  estimate  is  altered  according 
to  the  error  between  expected  and  actual  measurement  just  as  we  have  seen  for  observers. 
The  key  difference  between  observers  and  Kalman  filters  is  that  the  gain  matrix  E  depends 
on  the  variance  of  the  measurements,  or  in  other  words,  a  discrepancy  between  predicted 
and  actual  measurement  influences  the  estimate  in  proportion  to  the  confidence  we  have 
in  that  measurement.  This  provides  the  filter  with  the  ability  to  estimate  optimally  the 
state  vector  in  the  sense  of  minimal  noise  corruption,  but  places  the  burden  of  determining 
the  variances  on  the  designer.  This  methodology  may  be  extended  to  nonlinear  systems  In- 
linearizing  the  state  and  output  equations  around  the  current  estimate  in  every  iteration. 
This  is  known  as  the  Extended  Kalman  Filter  (refer  to  [1C]).  We  have  summarized  the 
equations  of  the  Extended  Kalman  Finer  below.  Observe  that  they  reduce  to  a  simpler 
form  in  the  special  case  of  linear  system. 


The  Extended  Kalman  Filter 


(21  ’ 


System  Model 

Output  Model 

Xfc+i  =  f(xjt.Ufc)  +  w k  \vfc  ~  N{ O.Q*) 

y k  =  g( Xfc )  -1-  Vi.  vfc  ~  .V(0,  Ri ) 

Initial  State 

Non-correlation 

x0  ~  A'  (  xo-  Po) 

E[wkv'l  ]  =  0  for  all  A- 

State  Estimate  Propagation 
Error  Covariance  Propagation 

*fc+i  = 

pj:+,=*k.Pt*I  +  Qk 

State  Estimate  Update 

Error  Covariance  Update 

Filter  Gain 

=  K  +  E*[y k  -  g(** )] 

P+  =(I-EfcCfcr)P,T 

Efc  =  Pkck{cjpj;ck  +  Rk}-' 

We  have  used  the  abbreviations  ^  .  for  the  system  Jacobian,  Ck  =  _ 


x* 


for  the  output  Jacobian  and  r  ~  Ar(p,cr)  for  a  normally  distributed  random  variable  x 
with  expected  value  p  and  variance  n.  The  Kalman  filter  forms  the  core  of  the  estimation 
algorithms  for  motion  vision  presented  below. 


3  Dynamical  Systems  for  Motion  Vision 


Before  formally  modelling  the  imaging  situation  as  a  dynamical  system  we  provide  some 
insight  into  why  such  an  interpretation  may  apply.  Recall  that  a  dynamical  system  consists 
of  a  state  that  changes  in  time  according  to  a  plant  equation  (1)  and  a  measurable  output 
which  is  a  function  of  that  state  at  every  instant  (2).  In  the  imaging  situation  in  motion 
vision  the  position  and  attitude  of  an  object  being  viewed  changes  in  time  according  to  the 
kinematic  equations.  Only  a  function  of  position  and  attitude,  namely  the  projection  into 
the  image  plane,  can  be  measured  at  every  instant.  We  could  identify  position  and  attitude 
as  the  states  of  our  system,  the  kinematic  equations  as  the  plant  and  the  2D  projection  as 
on r  measurement. 
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However,  the  interpretation  is  not  quite  that  simple.  We  are  not  really  measuring  the 
position  and  orientation  of  the  2D  projection.  We  merely  have  an  array  of  brightness  values 
available  in  every  frame.  One  approach  would  then  be  to  determine  first  the  location  and 
orientation  of  the  projection  from  the  grey-level  array  and  use  the  result  as  a  new  “meta”- 
measurement.  This  leads  to  the  feature-based  approaches  mentioned  in  the  introduction. 
An  alternative  is  to  make  assumptions  about  the  formation  of  the  brightness  value  and 
in  particular  how  a  change  in  brightness  relates  to  a  change  in  position  and  orientation. 
The  most  common  assumption  is  the  brightness  change  constraint  equation  of  Horn  and 
Schunck  [27].  The  idea  there  is  that  the  brightness  of  a  patch  corresponding  to  a  fixed 
location  on  the  surface  being  viewed  will  not  change  significantly  from  one  image  to  the 
next. 

A  conclusion  we  can  draw  from  these  thoughts  is  that  the  major  emphasis  in  finding 
suitable  dynamical  models  for  the  motion  imaging  situation  will  be  on  constructing  appro¬ 
priate  measurement  models.  Since  there  is  no  unique  model  for  a  given  system  a  major  task 
is  to  investigate  different  models  and  identify  those  that  represent  the  temporal  behavior 
most  accurately  while  remaining  at  a  low  level  of  complexity  (nonlinearities,  order  of  the 
system,  etc.). 

We  have  constructed  three  models  of  the  imaging  situation  that  differ  mainly  in  the 
type  of  measurement  assumed.  All  models  include  the  depth  (distance  to  the  object)  as 
a  state  that  is  not  measurable  in  the  output.  The  main  idea  is  to  employ  a  state  space 
observer  or  a  Kalman  filter  as  described  in  section  2.7  to  estimate  the  depth.  One  would 
also  like  to  recover  the  motion  of  object  or  observer.  Note  that  an  observer/filter  solution 
would  then  require  the  presence  of  some  plant  equation  describing  the  temporal  evolution 
of  the  motion  parameters.  In  the  case  of  object  motion  this  is  rarely  ever  known  and  we 
can  only  hypothesize  a  model  as  done  in  section  5.  When  the  camera  is  in  motion  we  may 
use  the  dynamics  of  the  actuating  device  to  obtain  such  a  relationship  as  we  suggest  in 
section  7.  We  believe  that  the  best  procedure  is  to  estimate  depth  with  a  Kalman  filter 
and  use  the  resulting  depth  map  to  compute  motion  in  a  least-squares  fashion.  This  is  the 
approach  taken  in  section  6. 

The  first  model  we  present  is  in  a  sense  the  physically  “correct”  model  as  it  attempts 
to  establish  the  relationship  between  the  measured  brightness  value  variation  and  the  ob¬ 
ject/observer  motion.  This  requires  an  assumption  about  the  reflectance  properties  of  the 
surface  (lambertian,  specular  etc.)  and  is  therefore  rather  restrictive.  We  also  assume  that 
motion  is  known.  A  dense  depth  map  is  recovered  using  an  Extended  Kalman  Filter. 

The  second  model  employs  the  brightness  change  constraint  assumption  to  avoid  mod¬ 
eling  the  surface  reflectance  and  illumination  conditions  explicitly.  Since  the  brightness 
change  constraint  equation  is  an  implicit  equation,  this  requires  a  modification  of  the  gen¬ 
eral  observer  scheme.  The  other  important  characteristic  of  this  approach  is  that  the  surface 
being  viewed  is  assumed  to  be  planar,  which  allows  us  to  parametrize  it  and  recover  the 
surface  parameters  rather  than  a  dense  depth  map.  We  also  show  how  the  use  of  a  simple 
motion  model  permits  us  to  recover  motion  within  the  observer. 

Section  6  finally  presents  a  system  that  estimates  both  motion  and  depth  recursively. 
A  dense  depth  map  is  obtained  either  directly  from  brightness  values  or  from  a  correlation- 
based  displacement  measurement  using  the  current  estimate  for  the  motion  parameters. 


This  constitutes  the  update  section  of  a  non-linear  Kalman  filter.  The  new  depth  map 
is  then  used  to  find  a  new  least-squares  estimate  for  the  current  motion  and  a  predicted 
depth  map  for  the  subsequent  frame.  This  is  formulated  as  the  prediction  stage  of  a  Kalman 
filter.  This  scheme  requires  no  assumptions  about  surface  reflectance,  shape,  illumination, 
or  motion  (besides  the  ones  inherent  to  the  brightness  constancy  assumption,  see  Verri  and 
Poggio  [60])  and  is  therefore  best  suited  for  implementation  and  processing  of  real  imagery. 

The  remainder  of  this  section  is  devoted  to  the  introduction  of  basic  terminology  for 
the  imaging  situation,  choice  of  coordinate  system,  etc.,  which  is  similar  to  that  of  Horn 
[25]  and  Negahdaripour  [42]. 

3.1  Coordinate  Systems 

We  will  use  a  coordinate  system  with  origin  coincident  with  the  center  of  projection  of 
the  camera,  the  z-axis  pointing  towards  the  image  plane.  The  image  plane  is  parallel  to  the 
x-y-plane  at  unit  distance  from  the  origin  i.e.  we  express  image-plane  coordinates  in  units 
of  focal  length.  The  situation  is  shown  in  figure  (5). 


Figure  5:  The  viewer  centered  coordinate  system 


A  point  on  an  object  in  the  real  world  is  represented  by  the  vector 


and  the  projection  of  that  point  into  the  image  plane  by 

r  =  [x,y,  1]T.  (23) 

The  perspective  projection  equation  contains  the  relationship  between  both  points: 

R  R 

r~  R  z~  Z  (  ^ 

where  z  denotes  a  unit  vector  along  the  2-axis. 

3.2  Continuous  Motion 

We  assume  that  object  and  camera  are  in  relative  motion  given  by  a  translational 
vector  t (f)  and  a  rotational  vector  w(t)  in  the  coordinate  system  of  the  camera.  In  [18]  we 
showed  that  the  motion  perceived  in  the  image  plane  is  the  relative  motion  of  object  and 
camera  if  both  object  and  camera  motion  parameters  are  defined  in  a  coordinate  system  of 
the  camera.  We  also  showed  that  the  generalization  from  instantaneous  velocities  t.  u>  to 
velocity  functions  of  time  t(t),  u(t)  is  valid. 

Consider  a  point  on  the  observed  object  with  position  vector  R (<).  The  motion  of  the 
point  on  the  object  is  given  by 

R(<)  =  -t(<)  -  u(t)  x  R (t)  (25) 

and  the  corresponding  motion  field  in  the  image  plane  is 

r=-zx(rx(rxw-~)).  (26) 

This  is  also  referred  to  as  the  motion  field  equation. 

3.3  Discrete  Motion 

We  will  discover  that  although  the  above  continuous  formulation  accounts  precisely  for 
the  changes  due  to  motion,  a  discrete  description  is  better  suited  for  the  purpose  of  con¬ 
structing  a  computer  algorithm  to  implement  dynamical  systems  techniques.  This  means 
that  we  must  use  finite  rotations  and  translations  instead  of  the  infinitesimal  quantities  t 
and  w.  There  are  numerous  ways  of  representing  finite  interframe  transformations.  We  will 
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use  a  displacement  vector  p  for  the  translation  and  a  unit  quaternion  <1  for  the  rotation. 
Unit  quaternions  are  an  efficient  means  of  representing  rotations,  used  in  vision  by  [7],  [22], 
[24].  Information  on  the  subject  is  available  in  [51],  [53],  [22]  and  [48]  where  the  latter 
provides  an  excellent  comparison  of  various  representations  of  rotation. 

For  our  purposes  it  will  be  sufficient  to  understand  a  unit  quaternion,  which  we  will 
denote  by  a  circle  above  a  character  in  what  follows,  as  a  four-dimensional  unit  vector  or  a 
generalized  complex  number 


=  <7o  +  iqx  +  jqy  +  kqz 


y.j. 


Rotations  about  an  axis  along  the  unit  vector  u)  =  [wr,  ujy,  lj2]t  by  an  angle  0  are  represented 
by  the  unit  quaternion 

0  0  6 

q  =  cos  -  4-  sin  - (iux  +  juy  -f-  kuz).  (28) 

O 

Vectors  are  purely  imaginary,  scalars  purely  real  quaternions.  So  if  <7  denotes  a  rotation  as 
above  and  x  is  the  quaternion  corresponding  to  some  vector  x  then  the  rotated  quaternion 


x^qxq*^  0  (fo  +  Qi-%-rz)  2(qxqy-q0q2)  2(qxqz  +  q0qy)  ^9) 
0  2(qxqy-q0q,)  (?o  “  ~  Ql)  2 (qyqz  ~  qoqx) 

.  0  2 {qxqz  -  qQqy)  2 (qyqz  -  q0qx)  (q%  -  q2x  -  q2y  +  q])  . 

where  q  is  the  conjugate  quaternion  to  q. 

How  does  a  rotation  and  translation  in  this  representation  change  a  real-world  posi- 

0  .  0 

tion  vector  amd  its  quaternion  equivalent  Rk  at  discrete  time  k ?  We  rotate  R k  by  a 

O  O 

quaternion  qk  using  (29)  and  translate  by  a  finite  displacement  quaternion  Pk  to  obtain 

O 

Rk+l- 

Rk+ 1  -  Pk  +  QkRkvl  (30) 


The  discrete  motion  field  (the  displacement  field  rk+l  -  rk)  is  determined  by 

°  £*+1  Pk  +  (Qkhl^Zk  /0iN 

rk+i  -  -x - -  - — 0  o'» - T  (31) 

Zk+1  {(Pk  +  (<lkrkqk)Zk)  ■  z ) 

where  a  is  a  unit  quaternion  with  z-component  1.  The  denominator  of  the  last  term  is 
simply  the  last  component  of  equation  (30). 

4  A  Discrete  Model  for  Brightness  Measurements 

The  first  decision  when  modeling  a  dynamical  system  is  which  quantities  should  be 
included  in  the  state  vector  and  which  quantities  are  measurable  as  output.  We  have 
already  indicated  the  importance  of  the  measurement  model,  which  we  derive  first  in  this 
section.  We  designate  the  brightness  values  E(x,y)  to  be  the  measurements  in  this  model 
where  x  and  y  are  image  plane  coordinates.  The  measurement  model  must  then  relate 
E(x,y)  to  the  time- varying  quantities  in  the  svstem  such  as  the  location  and  orientation  of 
the  object  being  viewed.  Depending  on  which  relationship  we  obtain,  we  must  then  compile 
all  time- varying  quantities  therein  into  the  state  vector  and  formulate  a  plant  equation  that 
governs  their  temporal  evolution. 

Another  major  decision  is  whether  to  formulate  the  model  discretely  or  continuously. 
Since  the  measurements  are  taken  discretely  and  the  Kalman  filter  is  inherently  a  discrete 
algorithm,  we  decided  to  formulate  the  model  discretely. 


| 


wo 


V 


4.1  An  Output  Equation  for  Brightness  Values 

We  have  designated  the  brightness  values  in  the  image  to  be  our  output  or  measure¬ 
ment.  To  derive  a  corresponding  output  equation  for  use  in  our  dynamical  system  we  must 
determine  the  relationship  between  the  brightness  E  at  some  point  ( x ,  y)  in  the  image  plane 
and  the  time- varying  parameters  of  the  system  such  as  position  and  attitude,  which  we  later 
will  include  in  the  state  equations.  We  need  some  basic  physical  facts  about  the  imaging 
situation  that  we  have  taken  from  [25],  chapter  10. 


Figure  6:  The  formation  of  brightness  values 


Consider  a  point  P  on  an  object  whose  position  vector  forms  an  angle  a  with  the  optical 
axis  and  whose  radiance  is  L  (figure  6).  The  image  brightness  at  the  projection  of  P  into 
the  image  plane  is 

E  =  L  j)2  cos4  a  (32) 

where  d  is  the  diameter  of  the  camera  lens  and  /  is  the  focal  length  which  we  scale  to  1  in 
what  follows.  The  ccsine  can  be  expressed  in  terms  of  the  real-world  or  image  coordinates 

COS  a  =  ■■■  =====  =  ■=::■  =  ■  y  -  1=  =====  (33) 

v/x2  +  Y2  +  Z  2  s/TT^Ty1 


so  we  have 


Lxd2Z4 

4(.Y2  +  r2+  Z2)2' 


V  r 


Unfortunately,  L  depends  on  the  type  of  surface  being  viewed  and  the  illumination.  On 
the  other  hand  it  is  intuitive  that  we  will  not  be  able  to  model  image  brightness  values 
correctly  without  some  assumptions  about  illumination  and  surface  reflectance.  The  two 
most  widely  used  surface  models  are  Lambertian  and  specular.  For  the  Lambertian  surface 
we  have  (see  for  instance  [25]) 

L  -  p(  i  ■  n)  (35) 

where  p  is  the  surface  albedo,  i  is  a  unit  vector  in  the  direction  of  the  light  source  and  n  is 
a  unit  vector  along  the  surface  normal  of  the  point  being  viewed. 

Several  models  have  been  proposed  for  specular  surfaces.  The  one  developed  by  Tuong- 
Phong  [57]  and  applied  to  computer  vision  by  Horn  [26],  [23]  models  the  radiance  of  a 
surface  illuminated  by  an  extended  source  as 

L  =  A(5gi,»  (36) 

where  A  is  a  specular  reflectance  coefficient,  n  a  parameter  that  describes  the  compactness 
of  the  specular  patch,  and  s  is  a  unit  vector  in  the  direction  of  perfect  specular  reflectance 
which  can  be  written  as 

s  =  2(i  •  n)n  -  i.  (37) 

In  practice  a  linear  combination 


L  —  iLlambertian  T  (1  t)LSpecu[ar 


(38) 


provides  a  good  approximation  in  which  t  may  vary  spatially. 

Using  the  expressions  for  the  radiance  (35),  (36)  in  the  equation  for  the  image  brightness 
(34)  yields 


E  = 


p*d2Z4 

"PF 


(i-n) 


(39) 


and 


E  = 


Ax  d2Z4 
4|R|4+n 


(2(i  •  n)(R  •  n)  -  (i  ■  R))n. 


(40) 


We  see  that  in  both  cases  E  depends  on  the  time- varying  quantities  R  (the  position  vector) 
and  n  (the  surface  normal).  This  means  we  must  include  these  variables  in  our  state  vector. 

We  note  that  in  both  the  specular  and  the  Lambertian  case,  we  can  use  r  instead  of  R 
in  the  expressions  for  E.  The  time  dependency  of  r  is  more  complex  than  the  one  for  R  as 
we  see  by  comparing  (25)  and  (26).  For  this  reason  and  for  the  sake  of  variety  we  use  R 
here.  We  also  focus  on  the  simpler  Lambertian  case  in  what  follows,  although  the  specular 
case  is  analogous. 


4.2  The  Model 


Our  output  equation  (39)  dictates  that  the  state  vector  must  contain  at  least  R  and 
n.  We  must  therefore  determine  how  these  vectors  change  due  to  motion.  Our  experience 
with  the  previous  model  revealed  the  advantages  of  a  discrete  motion  representation.  We 


assume  that  the  camera  undergoes  a  translation  Pk  and  rotation  t  at  time  k.  What  is  the 

O  0  o 

state  equation  for  Rk,  i.e.  how  does  Rk+i  depend  on  R^  This  is  simply  the  object  motion 
equation  (30): 

Rk+i  +  (41) 

We  must  determine  the  analogous  relationship  for  the  change  in  the  unit  surface  normal 
nk-  A  unit  normal  remains  unchanged  under  translation  of  the  coordinate  system  -  it  will 

O 

merely  be  rotated  by  q  according  to 


0  0  o  o* 

nk+i  =  QknkVk- 


We  can  now  summarize  our  model  from  (41),  (42)  and  the  output  from  (39)  as 
Xjt+l  =  |  VT‘  1  =  1  o  0"  o'-'  "  |  =f(Xfc,Ufc) 


pnd?Z£ 

Vk  =  Ek  =  -^-yr (l  *  n*)  ~  sM 


Rk+i 

0  0  o  0* 

Pk  +  QkRkQk 

.  nk+ 1  . 

O  0  o* 

Qk’nk'Ik 

(42) 


(43) 


(44) 


O  O  .  p  0  0  o 

where  the  input  is  u*  =  [p*,9*]  .  This  model  contains  some  redundancy  since  R,  P  and  n 
are  vectors,  i.e.  purely  imaginary  quaternions  (we  can  omit  their  first  components  in  the 

O 

model),  n  is  a  unit  vector  (we  only  need  two  of  its  components)  and  q  is  a  unit  quaternion 
(we  only  need  three  components  to  represent  it).  So  a  minimal  model  could  have  the  state 
and  input 


xfc  = 


Vk 

Zk 

^xk 

nyk 


U  k  = 


Pxk 

Pyk 

Pzk 

qxk 

qyk 

Qzk 


(45) 


This  dynamical  system  describes  the  change  in  brightness  for  the  projection  of  a  particular 
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point  at  position  vector  R  as  a  result  of  translation  P  and  rotation  q.  As  before,  we  are 
interested  in  recovering  the  depth  Zk  and  finding  a  filter  suitable  for  performing  this  task. 


4.3  Estimating  Depth  using  an  Extended  Kalman  Filter 

The  extended  Kalman  filter  is  an  observer  for  a  nonlinear  system  corrupted  by  noise 
with  certain  probabilistic  assumptions  about  the  distribution  of  the  noise  and  the  initial 
values.  The  filter  equations  are  summarized  in  table  (21). 

We  assume  that  our  system  (43)  is  corrupted  by  noise  w*  which  is  iV(0,  Q*)  distributed, 
so  that 


Similarly,  the  measurements  are  influenced  by  noise  vk  which  is  N(0,Vk)  distributed.  The 
measurement  equation  becomes 

Vk  =  a(xk)  +  Vk-  (47) 

We  finally  assume  that  the  initial  value  of  the  state  x0  is  normally  distributed  around  our 
initial  guess  xo  with  known  covariance  Po  and  that  the  noise  processes  influencing  stale 
and  output  are  uncorrelated:  E[v/kVk]  =  0  for  all  k. 

We  can  make  these  assumptions  in  our  particular  case  but  must  deal  with  some  of  the 
issues  explicitly  before  proceeding.  A  good  initial  state  estimate  i.e.  x<),  is  not  available 
without  additional  information.  We  can  only  guess  it,  but  we  can  run  our  filter  with 
several  different  initial  values  and  select  those  that  converge.  Similarly,  the  covariance 
Po  of  the  initial  state  estimate  is  unknown  and  must  be  guessed.  The  covariance  Q  of 
the  state,  however,  may  be  assumed  to  be  zero  since  the  kinematic  equations  are  geometric 
relationships  and  not  subject  to  noise.  We  may  have  some  knowledge  about  the  uncertainty 
in  our  velocity  information  which  enters  into  the  kinematic  equations.  This  in  turn  would 
result  in  a  non-zero  covariance.  The  measurement  variance  r  is  due  to  noise  in  the  camera 
sensor.  A  noise  model  and  the  corresponding  variance  is  presented  in  section  6. 

We  find  that  the  state  estimate  propagates  as 


*fc+i  =  W,ufc) 


and  the  covariance  as 


PH.=#*W  +  Qi. 

Between  propagation  we  update  the  estimates  from  x^  to  x£  according  to 


=  h  +  ek[yk  -  g(Zk )] 


and  the  covariance  according  to 


Pjf  =(I-e*c2’)P£ 


where  the  Kalman  filter  gain  is  computed  as 

e*  =  pkck[clpkck  +  rk]~'.  (52) 

These  equations  contain  the  Jacobians  ^  I  and  c k  =  ^  .  . 

Xjl; 

In  order  to  apply  this  algorithm  we  need  merely  compute  the  matrices  and  c.  In  the 
case  of  the  minimal  system  with  state  and  input  from  (45)  we  have 

(<7o  +  <?X  -  <?y  -  <?2)  2(<lx<ly  -  <7o7?)  2(qxqz  +  q0qy) 

2(<7x9y  -  <7o<7*)  (<?0  -  <7r  +  ffv  -  </*)  "KWz  -  qoqx) 

*  =  2(?x<72  -  qoqy)  2 (qyqz  -  q0rlx)  (ql  -  qj  -  q2y  +  q2)  (53) 

ooo 

ooo 


1 

I 


g! 

Is 

& 


AV«’Av;/;Av.y,v;.'^ 


(?0  +  9x  -  9y  -  ?*) 
2(9x9y  -  <?o<7*) 


2(?r9y  90?z) 

{ql-ql  +  q2y-ql) 


pnd2Z4 

|R|4 


_i-n  x- 
iRp  A 
i  n  x/ 

.  IRFy 

i  n  A2+12 

5IT|i 


where  we  have  omitted  the  hat  as  an  indication  of  estimation  and  the  time  index  k. 

Let  us  finally  consider  how  the  measurement  and  filter  process  will  be  implemented. 
The  plant  equation  (43)  describes  how  a  point  P  and  the  surface  normal  at  that  point 
change  relative  to  the  camera  due  to  motion.  The  Kalman  filter  algorithm  will  therefore 
recover  the  depth  Z  of  that  particular  point.  Consequently,  the  brightness  measurement 
used  in  the  filter  must  be  the  brightness  at  the  location  of  the  projection  of  P  into  the 
image  plane.  Since  this  projected  point  moves  throughout  the  measurement  process,  we 
would  have  to  try  to  estimate  the  position  of  the  point  in  every  frame  and  then  take  the 
measurement  there,  which  seems  rather  error-prone.  To  obtain  a  dense  depth  map,  we 
must  repeat  this  for  all  points  in  the  region  of  interest. 

Instead,  we  will  store  the  current  depth  estimate  with  every  grid  point  of  the  image 
array  and  use  the  brightness  value  at  that  location  to  update  the  state  estimate  in  the 
Kalman  filter.  The  new  depth  estimate  will  then  be  associated  with  the  location  ( x,  y )  = 
(X / Z ,Y / Z)  which  is  the  position  we  expect  the  projection  of  P  to  move  to  according  to 
our  estimate.  This  position  may  not  lie  at  a  grid  point  so  we  must  interpolate  to  obtain 
the  values  at  the  grid  points.  The  details  of  this  interpolation  are  discussed  in  section  6. 
We  can  summarize  the  filter  algorithm  (there  is  one  filter  for  every  point  observed)  as 
follows: 

(1)  Set  initial  values: 

For  every  point  1  tom  under  consideration  set  xj  =  0  unless  better  initial  guess  available. 

(2)  Filter  loop: 

For  frames  k  =  1  to  n  do 

For  all  points  1  to  m  do 

(2.1)  Measure  the  brightness  values  Et-. 

(2.2)  Compute  the  system  matrices  and  c *.  from  equations  (54)  and  (55)  for  the 
current  state  estimate  x£  and  the  known  input  u^. 

(2.3)  Compute  the  state  estimate  x^+1  and  the  covariance  P^+1  for  time  k- f  1  from 
equations  (48),  (49). 
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(2.4)  Compute  the  filter  gain  e*.+  i  from  equation  (52). 

(2.5)  Update  the  state  estimate  to  x£+1  and  the  covariance  to  P£+1  using  equations 
(5°).  (51). 

(2.6)  Associate  the  new  state  with  image  location  {Xt+\l %k+\'  *fc+i/^Jk+i)  an^  'n* 
terpolate  the  state  estimates  at  integral  grid  point  locations. 


Gridpoint 

Inter- 


Figure  7:  A  block  diagram  of  the  Kalman  filter 


A  block-diagram  of  this  algorithm  is  shown  in  figure  (7).  There  is  one  such  filter  loop 
for  every  point  in  the  region  of  interest. 


4.4  Discussion 


We  can  easily  see  that  the  above  algorithm  runs  in  0(nm)  time  where  n  is  the  number 
of  frames  and  to  is  the  number  of  points.  This  is  clearly  optimal  for  surface  estimation  that 
makes  no  assumption  about  the  surface  structure  (i.e.  we  must  look  at  every  point  in  every 
frame).  A  parallel  implementation  of  the  algorithm  could  assign  one  processor  to  every 
image  plane  pixel,  which  can  compute  the  observer  matrices  for  those  points  in  parallel. 
Every  processor  will  also  evaluate  the  new  state  estimate,  which  determines  the  location 
of  the  processor  to  operate  on  the  pixel  in  the  next  frame.  The  current  estimate  must  be 
transmitted  to  that  processor.  Since  interframe  changes  will  be  small  for  high  frame  rates 
it  is  sufficient  that  locally  neighboring  processors  be  interconnected  in  order  to  execute  this 
transmission  efficiently.  The  parallel  complexity  is  then  merely  Q(n). 

So  besides  being  fast,  the  algorithm  makes  no  assumptions  about  surface  structure  and 
can  produce  a  dense  depth  map.  The  interesting  properties  (as  well  as  the  deficiencies)  of 
the  algorithm  given  above  are  summarized  here: 

(1)  We  use  a  discrete  model  and  avoid  discretization  errors. 

(2)  The  use  of  brightness  values  as  output  permits  us  to  operate  directly  on  the  measured 
data  but  also  requires  knowledge  of  surface  and  illumination  parameters. 

(3)  We  assume  that  motion  is  known. 

(4)  Noise  is  modelled. 

(5)  Depth  is  recovered  using  a  nonlinear  Kalman  filter  that  is  optimal  in  the  presence  of 
noise. 

(6)  Efficiency  and  perhaps  convergence  rely  on  a  good  initial  guess  for  the  state  vector. 

From  this  perspective  we  can  also  see  why  the  filter  algorithms  given  by  Broida  and 
Chellappa  [6],  [7]  require  feature  correspondences  between  frames  to  be  precomputed.  In¬ 
deed  if  we  had  those  correspondences  here,  the  performance  of  the  algorithm  would  increase 
significantly  because  we  no  longer  have  to  estimate  the  position  to  which  a  given  point  will 
move  in  the  next  frame  and  interpolate  the  state  estimates  at  grid  points.  Conversely  the 
estimator  can  be  used  to  help  establish  feature  correspondences  because  we  estimate  where 
the  feature  has  moved  to  in  the  subsequent  frame.  This  is  an  additional  interesting  applica¬ 
tion.  Only  tests  with  an  implementation  will  eventually  reveal  performance  and  accuracy. 
Problems  (1)  and  (2)  are  inherent  to  the  model  and  we  attempt  to  alleviate  them  in  what 
follows. 

5  A  Model  for  a  Planar  Surface  using  the  Brightness  Change  Constraint  Equa¬ 
tion 

We  have  seen  that  the  attempt  to  construct  a  precise  physical  model  of  the  imaging 
situation  forces  us  to  make  assumptions  that  cannot  always  be  guaranteed  for  real  images 
(for  instance  that  we  have  lambertian  surfaces).  We  will  use  the  approximation  of  the 
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brightness  change  constraint  equation  in  this  section  to  avoid  having  to  model  the  formation 
of  brightness  values  in  the  image  explicitly  via  surface  reflectance  properties.  Another 
improvement  the  previous  model  calls  for  is  to  drop  the  requirement  of  known  motion. 
This  is  addressed  by  assuming  a  very  simple  dynamical  model  of  the  motion  which  allows 
us  to  incorporate  motion  into  the  plant  equation.  Finally,  we  assume  that  a  planar  surface 
is  being  viewed  which  changes  the  nature  of  the  problem  from  having  to  estimate  a  dense 
depth  map  to  estimating  a  small  number  of  parameters  for  the  plane. 


5.1  The  Discrete  BCCE:  An  Implicit  Output  Equation 


Horn  and  Schunck  popularized  the  use  of  the  brightness  change  constraint  equation 
(BCCE) 

dE 

IT  = 0  <56> 


in  their  work  on  optical  flow  [27].  Negahdaripour,  Weldon  and  Horn  [29],  [42]  employed 
the  BCCE  to  recover  motion  and  surface  parameters  instantaneously.  Here  we  investigate 
the  use  of  the  BCCE  as  an  output  equation  of  a  dynamical  system.  The  BCCE  can  be 
expressed  in  terms  of  image  brightness  derivatives 


Er  •  r  +  Et  =  0  (57) 

where  Er  =  [§f)§f,0]r  and  Et  =  We  wish  to  obtain  a  discrete  form  of  the  BCCE 
analogous  to  the  previous  equation.  The  Taylor  series  expansion  of  E  provides 


AE  =  Er  •  Ar  -f  EfAt  =  0  (58) 

and  assuming  that  we  have  measurements  at  discrete  times  k  =  0,1,...  we  obtain  Ar  = 
Th+i  —  r k  and  At  =  /;+!  —  &  =  1  so 


E rk  ■  (r/t+i  -  rfc)  +  Etk  =  0. 


(59) 


The  discrete  motion  field  r^+1  is  given  in  equation  (31)  in  section  3.3.  We  can  use  this 
equation  by  reexpressing  the  above  BCCE  in  terms  of  unit  quaternions.  This  presents  no 
difficulty  since  all  vectors  are  simply  replaced  by  corresponding  purely  imaginary  quater¬ 
nions.  Substituting  (31)  into  (59)  yields 


Erk 


k  +  (hhfk)Zk 

((k  +  (hhK)  z*)  •*) 


-  rk)  +  E(k  —  0 


and  after  some  rearrangement  wo  have 


(GO) 


0 

°  o  O*  Pk  .  O 

(fIkrkdk  +  ~y~)  ■  sk  —  0  (61) 

6k 

in  which  we  have  abbreviated  sk  =  Erk  +  ( Etk  -  Erk  ■  h)z  =  [0 ,Exk,Eyk,Etk  -  TkExk  ~ 
t/kEyk}T ■  This  is  the  discrete  brightness  change  constraint  equation. 


t'm  fc 


r 
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In  this  model  we  restrict  ourselves  to  planar  surfaces.  The  equation  of  such  a  plane  is 


where  n  is  a  normal  to  the  surface  and  l/|n|  is  the  distance  of  the  plane  to  the  origin. 
Substituting  the  quaternion  equivalent  into  the  discrete  BCCE  yields 

(hhh  +  •  h)Pk)  •  Sk  =  0.  (G3) 

This  relationship  depends  on  a  particular  point  in  the  image  plane.  Since  we  wish 
to  use  the  BCCE  to  make  statements  about  the  plane  and  its  motion  thal  are  obviously 
independent  of  particular  points  in  the  image  plane,  the  question  arises  at  which  point  in 
the  image  plane  (63)  should  be  evaluated.  Since  no  point  is  distinguished  we  may  choose 
one  at  random.  A  better  approach  is  to  sum  up  the  values  over  a  region  of  the  plane. 
Of  course  then  we  must  square  them  before  adding  to  avoid  cancellation.  This  could  be 
written  as 

£  ElAUl +  (»*•?**)•  ^l2  =  0  (64) 

X=-W  y=-h 

where  w  and  h  are  height  and  width  of  the  image  region  under  consideration. 

We  see  that  this  equation  implicitly  relates  the  measurable  quantities  s*  to  the  motion 

o  O  o 

and  surface  parameters  P ,  9  and  n  at  every  image  plane  point  r.  So  if  we  interpret  the 
former  as  an  output 

y  —  s  (65) 


and  the  latter  as  a  state 


we  have  a  relationship  of  the  form 

ff(x,  y)  =  0.  (67) 

We  have  used  vector  symbols  here  to  denote  the  imaginary  parts  of  the  above  quaternions. 

o  o  o  .  o  t  _ 

Since  s ,  p  and  n  are  purely  imaginary  and  9  is  a  unit  quaternion  it  suffices  to  include 
the  three  imaginary  components  in  the  state  -  the  remaining  component  is  either  zero  or 
related  by  some  algebraic  equation.  Unlike  our  usual  output  equation  (2),  equation  (67) 
only  implicitly  relates  state  and  output  so  that  standard  observation  techniques  do  not 
apply.  Before  we  consider  modifications  of  our  observer  to  deal  with  this  problem  we  have 
yet  to  establish  the  state  equations. 

5.2  The  State  Equations  for  the  Moving  Plane 

O  O  0 

How  do  the  three  components  n,  P  and  9  of  our  state  vector  change  in  time?  Let  us 
first  determine  how  the  surface  normal  will  change  due  to  the  motion.  This  is  not  trivial 
since  we  use  the  inverse  length  of  n  to  express  the  distance  from  the  origin. 


'•W 


For  pure  rotation,  the  distance  between  plane  and  origin  does  not  change  and  the  normal 

O 

is  simply  rotated  by  </*..: 

o  O  o  O* 

«*+ 1  =  <Iknkqk.  (68) 

For  pure  translation,  the  orientation  of  the  normal  remains  unchanged  but  the  distance 

O 

changes  by  the  amount  of  the  translation  displacement  Pk  projected  onto  the  unit  normal 
along  u  so 


which  can  be  rearranged  to 


1  1  o  nk 

IVhI  i»*i  k  l»*l 


,0  ,  _  K| 

l^'fc+ll  0  o 

1  +  Pk  ■  nk 


Since  a  unit  normal  along  n  would  remain  unchanged  by  translation,  we  have 


o  nk  o  . 

nk+i  =  -5-tI»*+iI  = 


O  o 

1  +  Pk  •  nk 


In  this  model  we  will  not  interpret  the  motion  parameters  P  and  q  as  input  but  as  states. 
This  allows  us  to  recover  them  with  our  observer  but  we  must  make  some  assumption  about 
the  dynamics  of  these  vectors  in  order  to  formulate  a  state  equation.  In  a  general  case  we 
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could  say  that  Pk  and  Qk  are  the  solution  to  some  simple  difference  equation 

Pk+i  =  P  Pk  and  qk+l  =  Q  qk  (72) 

which  leaves  the  problem  of  finding  the  elements  of  P  and  Q.  In  the  case  of  a  moving 
camera  these  matrices  are  given  by  the  dynamics  of  the  actuating  device  (mobile  robot, 
motion  platform  etc.).  We  believe  that  this  interpretation  is  worth  pursuing  since  it  has 
many  practical  applications.  Here  we  will  content  ourselves  with  an  approximation.  We 
assume  that  the  motion  vectors  will  not  change  significantly  between  frames  because  the 
sampling  rate  is  high  compared  to  the  kinematic  changes  so  that  the  matrices  P  and  Q  are 
null  matrices.  We  then  have  a  special  form  of  (72): 


Pk+ 1  =  Pk 


'1  <7*+i  =  < h ■ 


We  can  now  summarize  our  state  equations  from  (71),  (73)  and  the  implicit  output 
equation  from  (64) 


Pk+ 1  = 


0  0 

\  +  Pknk 


f  (Xfc) 


>VV, 


g{xk,yk)  =  g(  pk  ,h)  =  £  £[&?A  +  A  A)Pk)  ■«?  =  <>.  (75) 

.  h  J 

As  we  already  mentioned,  we  can  restrict  ourselves  to  the  imaginary  components  of  the 
quaternions  in  the  above  relationships. 

5.3  A  Nonlinear  Observer  for  the  Moving  Plane 

It  is  clear  that  if  we  succeed  in  constructing  an  observer  for  this  system  we  can  recover 
depth  and  motion.  However,  the  output  equation  is  not  of  the  desired  form  and  depends 
on  the  image  plane  point  at  which  the  brightness  gradients  are  measured.  The  solution  to 
this  problem  is  to  modify  the  observer  such  that  it  can  handle  the  implicit  output  equation. 

Let  us  recapitulate  the  basic  idea  behind  the  observer  in  section  2.7.  The  observer  was 
basically  a  copy  of  the  dynamical  system  under  observation  corrected  by  a  matrix  multiple 
of  the  error  between  actual  and  estimated  output.  In  our  case  a  discrepancy  between  the 
“real”  state  x  and  the  estimated  state  x  will  result  in  a  non-zero  value  for  g(x,y).  So  we 
can  simply  use  g(x,y)  as  an  error  term  in  the  discrete  version  of  (19): 

xfc+1  =  f(xjt)  +  eg(xk,yk).  (76) 

The  estimation  error  £k  =  xk  -  x^  is  found  to  be 

£k+ 1  =  f(xfc)  -  f(xfc)  +  e(g(x k,yk)  -  g(xk,  yk))  (77) 

where  we  have  exploited  g(xk,  yk)  =  0  to  maintain  symmetry.  The  nonlinearity  of  state  and 
output  however  prevent  us  from  applying  the  simple  rules  presented  in  section  2.7  for  the 
calculation  of  e.  Using  the  ideas  of  the  extended  Kalman  fib  we  expand  the  non-linearities 
about  the  current  estimate 

A  £» 

f(xfc)  =  f(xfc) -f  —  (x*.  -  xk)  4  ...  «  f(xfc)  +  Afc(xfc  -  x*)  (78) 

OX  Xfc 
Q(r 

g(xk,yk)  =  <7(X)t,y*))  +  (xk  -  xk)  +  . ..  *  g{xk,yk))  +  ck(xk  -  xk)  (79) 

°x  x*.y 

which  we  substitute  into  equation  (77): 

£k+l  =  Akek  -  e(ck  ■  £k)  =  (A*  -  ec l)£k.  (80) 

The  error  difference  equation  is  time-varying  and  in  order  to  achieve  a  good  dynamic 
behavior  we  must  adjust  e  with  every  measurement,  so  we  have  e  =  ek.  The  idea  is  then, 
to  compute  Ak  and  ck  for  every  new  estimate  xk,  then  select  e*  such  that  the  eigenvalues 
of  Ak  —  ekcl  are  located  at  the  origin  and  finally  compute  the  new  estimate  x*+i-  The 
matrices  A  and  c  have  been  computed  in  the  appendix  A.  We  then  have  the  following 
algorithm: 
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(1)  Set  initial  values: 

Set  xo  =  0  (unless  better  information  is  available). 

(2)  Observer  loop: 

For  frames  k  -  1  to  n  do 

(2.1)  Compute  system  matrices: 

Compute  Afc  and  ck  using  the  formulas  (139)  and  (141)  for  the  current  estimate 
Xfc  and  measurement  yk. 

(2.2)  Compute  observer  gain: 

Compute  the  vector  ek  such  that  all  eigenvalues  of  Ak  -  ekc l  are  zero. 

(2.3)  Compute  next  estimate: 

Compute  the  next  estimate  kk+i  from  equation  (76). 

Since  we  are  not  using  a  filter,  the  algorithm  becomes  simpler  which  is  reflected  in  the 
block-diagram,  figure  (8). 


1  e 

Delay 

k 

^-1 

ek+l 

Estimate 

Update 


(139), (141) 


Figure  8:  A  block  diagram  of  the  observer  algorithm 


5.4  Discussion 


As  before,  we  would  like  to  make  a  statement  about  the  complexity  of  this  observer 
algorithm.  Here,  however,  we  do  not  have  m  observers,  one  for  each  point  of  interest.  But 
we  have  an  image  region  of  width  2w  and  height  2h  over  which  we  are  sampling.  If  again  we 
denote  the  number  of  frames  by  n,  the  algorithm  will  run  in  Q(whn )  since  one  measurement 
for  every  image  point  must  be  processed  to  compute  the  vector  c  in  (2.1)  in  every  frame. 
Comparing  the  complexities  is  rather  useless  since  the  results  and  assumptions  are  quite 
different  from  those  in  the  preceding  algorithms.  A  parallel  implementation  that  provides 
an  efficient  summation  of  the  expressions  in  the  components  of  c  could  reach  a  parallel 
complexity  of  0(n). 

We  also  note  that  a  closed  form  solution  for  the  instantaneous  problem  (2  frames)  using 
velocities  t  and  u  exists  (cf.  Negahdaripour  and  Horn  [42]).  It  may  be  used  to  provide  a 
good  initial  value  for  the  observation  algorithm.  On  the  other  hand,  an  observer  algorithm 
as  given  above  must  compete  with  the  repeated  application  of  the  instantaneous  algorithm. 
Applying  Negahdaripour ’s  solution  to  n  frames  also  requires  Q(whn)  operations.  A  com¬ 
parison  of  accuracy  can  take  place  once  implementations  of  both  schemes  are  available. 
In  doing  this,  we  must  observe  one  important  detail:  The  closed-form  solution  has  been 
shown  only  for  instantaneous  velocities,  not  finite  displacements  and  rotations  as  given  in 
this  section.  If  an  iterative  scheme  is  necessary  to  recover  the  latter  from  two  frames,  this 
would  clearly  increase  the  complexity. 

We  summarize  important  properties  of  the  above  algorithm: 

(1)  Incremental  recovery  of  surface  orientation  and  motion  parameters. 

(2)  Applicable  only  to  planar  surfaces. 

(3)  Motion  parameters  are  assumed  to  be  constant  over  time. 

(4)  The  model  uses  finite  translations  and  rotations  represented  by  quaternions. 

(5)  The  discrete  brightness  change  constraint  equation  is  formulated  and  employed  as  an 
implicit  output  equation. 

(6)  A  nonlinear  observer  is  presented  to  handle  implicit  output  equations. 

From  the  theoretical  point  of  view,  this  is  the  most  appealing  of  the  models  presented  in 
this  paper  as  it  recovers  surface  parameters  rather  than  distinct  depth  values.  On  the  other 
hand  it  only  applies  to  the  special  case  of  a  plane  in  motion.  Perhaps  a  fusion  of  both  will 
provide  interesting  results. 

6  Integrated  Motion  and  Depth  Estimation 

The  previous  examples  have  shown  the  duality  between  motion  and  depth:  If  we  know 
the  depth,  we  can  recover  the  motion  and  vice  versa.  But  if  we  try  to  solve  both  problems 
simultaneously  we  must  impose  constraints  that  restrict  the  solutions  to  rather  specialized 


}' 

S 


I 


IV 

S' 

Cl 


it 

IS 

1 


ft 


.Si 


\ 

a 

•I 

a 


cases.  A  conclusion  might  be  to  separate  the  two  compulations  and  feed  one  with  the  result 
of  the  other  in  an  iterative  scheme. 

We  have  seen  further  that  the  Kalman  filter  is  not  particularly  well  suited  for  motion 
estimation  since  no  dynamical  model  of  the  behavior  of  the  motion  parameters  is  available. 
The  idea  in  this  section  is  therefore  to  estimate  depth  using  the  Kalman  filter  and  update 
the  motion  estimate  to  fit  the  depth  map  in  a  least-squares  sense  in  every  iteration. 

The  algorithms  in  this  section  make  no  assumptions  about  surface  or  motion  other  than 
the  ones  inherent  to  the  BCCE  and  are  therefore  most  likely  to  be  successful  on  real  imagery. 
For  this  reason  we  discuss  all  the  necessary  details  in-depth  as  for  instance  measurement 
variances  and  pixel  interpolation.  We  have  tailored  the  scheme  to  two  different  implementa¬ 
tion  environments:  one  in  which  only  brightness  values  are  available  as  measurements  and 
the  other  in  which  a  correlation-based  displacement  estimator  is  used.  Conceptually  both 
approaches  follow  the  same  idea  but  they  have  different  efficiency  and  stability  properties. 


6.1  Using  brightness  measurements  directly 

The  brightness  change  constraint  equation  is  the  foundation  of  this  approach.  We  use 
it  in  its  differential  form 

dF 

-  =  ErT+£(  =  0  (81) 

where  r  is  the  motion  field  from  equation  (26).  If  we  substitute  the  motion  field  for  r  we 
obtain 

c  +  v  •  «  +  i(s  •  t)  =  0  (82) 

after  some  rearrangement  (refer  to  Negahdaripour  and  Horn  [42]  for  the  derivation).  We 
have  used  the  following  abbreviations:  c  -  Et,  s  =  (Er  x  z)  X  r  and  v  =  -s  X  r.  The  reader 
will  have  noticed  that  we  have  again  resorted  to  the  continuous  case.  The  reason  for  this 
lies  in  the  fact  that  the  brightness  change  constraint  equation  is  simpler  in  this  form  and 
yields  a  closed-form  solution  for  the  motion  when  depth  is  known. 

We  must  also  specify  the  dynamical  system  that  will  be  used.  It  will  be  one-dimensional 
with  the  sole  state- variable  Z(t).  The  plant  equation  is  simply  the  third  component  of  the 
kinematic  equation  (25)  i.e. 

Z  =  tz  +  ( u>xy  -  u>yx)Z.  (83) 

Z  will  also  be  our  measurement  so  we  have  a  very  simple  model.  There  is  a  slight  problem 
with  this  formulation,  however,  which  lies  in  the  fact  that  x  and  y  also  vary  in  time  and 
should  be  included  in  the  state.  We  avoid  this  by  interpreting  them  as  time- variant  param¬ 
eters  that  must  be  updated  in  every  iteration  according  to  the  motion-field  equation  (26). 
What  this  means  is  that  the  depth  prediction  Z  according  to  the  above  plant  equation  will 
be  valid  at  the  location  specified  by  the  new  coordinates  (z7,  y ')  which  may  not  lie  on  a  grid 
point  of  the  image  array.  This  leads  to  the  problem  of  having  to  interpolate  the  depth  as 
we  mentioned  in  section  4.  We  discuss  the  treatment  of  these  problems  in  detail  below. 

Now  consider  the  block  diagram  of  figure  (9).  A  new  image  arrives  at  iteration  k  and 
is  fed  into  the  depth  estimator.  Using  the  motion  estimate  from  k  —  1  we  can  solve  the 
brightness  change  constraint  equation  (82)  for  Z  at  every  image  point  and  obtain  a  dense 
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Figure  9:  Integrated  depth  and  motion  estimation  based  on  brightness  values 


depth  measurement.  This  is  used  in  the  update  stage  of  a  Kalman  filter  to  produce  a  depth 
estimate.  Note  that  in  this  case  the  state  variable  being  estimated  and  the  measurement 
quantity  are  identical.  The  Kalman  filter  is  merely  used  as  an  update  algorithm. 

The  updated  depth  map  is  used  to  compute  the  motion.  We  compute  the  motion  such 
that  the  error  in  the  brightness  change  constraint  equation  (the  deviation  from  zero)  is 
minimized  in  a  least-squares  sense  over  the  region  of  interest.  Finally  we  use  the  Kalman 
filter  prediction  to  compute  the  expected  depth  map  in  the  next  measurement  which  is 
made  available  to  the  filter’s  update  stage.  We  now  discuss  the  various  modules  in  detail. 

6.1.1  Depth  Estimation 

Estimating  depth  proceeds  in  a  straightforward  manner  based  on  the  brightness  change 
constraint  equation  (82).  We  solve  for  Z  and  obtain 


The  quantities  c,  v  and  s  depend  on  the  spatial  brightness  gradients  Ex  and  Ey  and  the 
temporal  brightness  gradient  Et  which  can  be  approximated  using  finite  differences  of  the 
image  brightness  values.  The  motion  parameters  t  and  u>  are  taken  from  the  previous 
motion  estimate.  We  thereby  obtain  a  dense  depth  map. 

Since  this  Z  will  be  the  measurement  quantity  in  our  Kalman  filter,  we  also  need  its 
variance.  A  closer  look  at  (84)  reveals  that  noise  in  the  measurement  of  the  brightness 
values  will  affect  Z.  Let  us  recall  a  basic  result  of  probability  theory.  Suppose  we  have  a 
random  variable  Z  which  is  a  function  of  a  collection  of  random  variables  xi,. . .  ,xn  so  that 

Z  =  f(x  i,. (85) 

Using  a  Taylor  series  expansion  of  /  around  the  “true”  value  of  Z  we  find  that 

4  =  (86) 

,=i  j=i  ox'  oxi 

where  cov(xi,xj)  denotes  the  covariance  of  the  two  random  variables.  Now  if  we  assume 
that  the  X{  are  independent,  which  is  usually  the  case  in  measurement  processes,  we  have 
cov(xi,Xj)  —  0  for  i  ^  j  and  cov(xi,Xj)  =  a2Xx  for  i  =  j.  We  then  have 

t=l  ax' 

In  our  case  the  brightness  measurements  are  interpreted  as  random  variables  and  we 
are  interested  in  determining  how  the  variance  in  the  brightness  measurements  will  affect 
the  variance  of  the  depth  estimate.  We  must  find  the  immediate  dependency  of  depth  and 
brightness  in  equation  (82).  We  substitute  the  expressions  for  c,  v  and  s  into  (82)  and  find 

r,  Ex{tx  ~  xtz)  T  Ey(ty  —  ytz) 


Et  +  Ex(xyu>x  -  (1  +  X2)uy  +  yuz)  +  Ey((l  +  y2)u>x  -  xyojy  -  xu>z) 

We  abbreviate  the  coefficients  of  the  brightness  gradients  to  obtain  a  more  compact  expres¬ 
sion 

=  aEx  +  bEy 

J  Et  +  cEx  +  dEy -  (89) 

The  brightness  gradients  are  estimated  through  some  finite  difference  approximation  as  for 
instance  the  one  suggested  by  Horn  and  Schunck  [27]  : 

E*  ~  4^{E(x  +  dx,  y,  t)  -  E(x,  y,  t)  +  E(x  +  dx,  y  +  dy,t)  -  E(x,y  +  dy,  t)-f 

E(x  +  dx,y ,  t  -(-  7  )  —  E(x,  y,t.  +  T)  -+■  E(x  4-  dT,  y  4-  dy,  t  4-  T)— 

E(x,y  +  dy,t.  +  T)) 

Ey  ~  ^(E(T,y  +  dy,t)  -  E(x,  y,t)  +  E(x  +  dr,y  +  dy,t)  -  E(x  +  dx,  y,t)+ 

E(x,y  +  dyJ  +  T)  -  E{x,y,1  +  T)+  E{x  +  dx,y  4-  dy,  t  +  T)-  (90) 

h{x  +  dx,  y,  t  -f  T )) 

Et  a  ^r(E(x,yJ.  +  T)  -  E(x,y,  t)  +  E(x,y  +  dy,  t  +  T)  -  E(x,y  +  dy,t)+ 

E(x  -|-  dx,  y,  t  -f  I )  —  E{x  +  dx,  xj,  t )  -f  E(x  4-  dx,  y  4-  dy ,  t  +  T)— 

E(x  +  dT,y  +  dy ,  <)). 
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We  have  denoted  the  distance  between  two  pixels  in  the  x  direction  by  dx ,  in  the  y  direction 
by  dy  and  the  inverse  frame  rate  by  T.  Note  that  this  approximation  is  the  average  over  4 
first  differences  along  the  edges  of  a  cube  in  spatio-temporal  hyperspace.  The  approximation 
is  valid  at  the  center  of  that  cube. 

We  can  apply  our  formula  (87)  for  the  propagation  of  variances  to  calculate  the  variance 
in  Z.  Note,  however,  that  the  Ex,  Ey  and  Et  are  not  independent  and  can  therefore  not 
be  used.  We  must  express  Z  in  terms  of  the  E{  (the  brightness  values  used  in  the  gradi¬ 
ent  approximations  (90))  which  we  may  assume  to  be  independent  identically  distributed 
random  variables  with  variance  o\.  We  then  find  the  variance  in  Z  to  be 
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2  _  dZ  .  2  2  _  /  A  dEx  dZ  dEy  dZ  ->  dEt  ,2 

°2  ~  hidE'  °E  ~  KdE*  hi dEi  dEv  hi  dE<  +  dEt  hi dE> 


which  evaluates  to 


l  2  (aEt  +  (ad  ~  bc)Ev )2  j?  +  (bEt  +  (be  -  ad)Ex )2  +  (aEx  +  h£y)2 

°Z  =  2°E  (Et  +  cEx  +  dEy)4  ^  ' 


(91) 


(92) 


We  have  determined  how  the  variance  in  the  depth  is  related  to  the  variance  in  the 
measurements.  But  what  is  the  variance  in  the  measurements?  One  component  of  the 
measurement  noise  can  be  modeled:  the  quantization  noise.  Since  our  sensor  discretizes 
brightness  values  into  gray-levels,  values  that  do  not  coincide  with  a  discretization  step  will 
be  “rounded  off”  to  the  next  step.  In  our  model  of  the  sensor,  the  discrete  sensor  output 
values  will  be  denoted  by  Ei  and  the  constant  quantization  step  by  A E  =  -  E,.  If  we 

assume  that  our  sensor  discretizes  in  the  following  fashion 
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■t'+x  if  ^actual  >  Ei  +  AE/2 
if  Eactual  5;  Ei  +  AE/2 


(93) 


and  that  brightness  values  are  equally  likely  in  (£?,-,  Iw+i]  the  expected  sensor  output  is 
easily  determined  as  (£«+i  +  £,)/ 2  and  the  variance  as 

(AEf 


°E  = 


(94) 


However,  this  models  only  one  possible  source  of  noiseand  distortion.  Others  include  the 
electrical  circuitry  in  the  sensor  or  simply  blurring  or  defocusing.  So  instead  of  attempting 
to  model  all  these  various  noise  sources,  we  can  simply  conduct  a  measurement  in  which 
a  uniformly  colored  simple  surface  is  viewed  to  provide  a  set  of  measurements  Ei,..., En. 
We  then  determine  the  expected  value 
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E(E)=-Y.E, 
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and  the  variance 


ol=l-±(Ei-E(E)j>. 


(95) 


(96) 
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Using  these  results  we  can  obtain  a  dense  depth  map  where  every  depth  value  is  accompa¬ 
nied  by  a  variance  that  will  be  used  by  the  filter  we  describe  below. 
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6.1.2  Kalman  Filter  Update  of  Depth  Map 

In  order  to  update  our  estimate  Zk  we  must  first  compute  the  filter  gain  from  (21). 
Since  the  system  is  one-dimensional,  this  is  a  scalar 

ek  =  Pk(Pk  +  rk)~l  =  -----  (97) 

Pk  +  rk 

where  rk  is  the  variance  of  the  measurement.  We  are  measuring  the  depth  so  rk  =  a\ 
which  was  computed  in  the  previous  subsection. 

The  filter  update  equations  from  (21)  are  also  extremely  simple: 

Zk  ~  +ek{Zk~Zk)  (98) 

p+  =  (1  ~ek)p~k.  (99) 

We  now  have  a  new  depth  map  and  also  a  covariance  map.  The  former  is  used  in  the 
motion  estimator  described  below. 

6.1.3  Motion  Estimation 

This  module  must  solve  the  following  problem:  given  the  depth  at  every  image  point, 
what  is  the  global  relative  motion  t,  u  between  camera  and  environment.  Under  the 
assumption  of  the  validity  of  the  brightness  change  constraint  equation,  this  problem  has 
been  solved  [29].  Assuming  that  the  region  of  interest  has  width  2 w  and  height  2 h  centered 
at  the  origin,  we  are  seeking  a  motion  estimate  t  and  u>  that  minimizes 

w  h  * 

E  X]  (c  +  v'w  +  j(s,t))2-  (100) 

x=— u>  y=-h 

Differentiating  with  respect  to  t  and  u  and  equating  to  0  results  in 

t v  h  +  w  h  w  h 

( E  E  (vsT)  +  ( Y,  E  =  -  E  E  vs 


x=  —  w  y=-h 
w  h 


x=-w y=—h 
w  h 


X—  —  W y—  —  h 

w  h 


w  h  tv  h  i  w  h 

(  E  E  (vvi'))w  +  (  E  E  (svT) 7)1  =  -  E  E  cv 


X=  -  W  y=  —  h 


x=-w  y—- h 


X—  —  tu  y— —  h 


a  6x6  system  that  can  be  solved  for  the  desired  parameters  in  general. 

6.1.4  Kalman  Filter  Prediction  of  Depth  Map 

Given  the  current  estimate  for  the  motion  parameters  and  the  updated  estimate  for  the 
depth  map  the  prediction  phase  of  the  Kalman  filter  produces  a  predicted  depth  map  for 
the  following  iteration.  Looking  at  the  Kalman  filter  (21)  we  see  that  we  need  predictions 
for  .  tate  and  covariance.  A  simple  first  differences  approximation  for  our  plant  (83)  yields 
the  following  discrete  form 

Zk+ 1  =  Zk  +  T(tz  +  (a.’xy  -  uyx)Zk)  (102) 
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which  is  the  prediction  equation  for  the  depth.  We  can  see  that  the  system  “matrix”  is 

<t>k  =  1  +  T(uxy  -  u>yx).  (103) 

It  is  needed  in  the  prediction  equation  for  the  covariance  which  reads 

Pk+i  =  i1  +  T(UJ^y  ~  U}yx)?Pk-  (1Q4) 

As  we  have  argued  previously  we  have  assumed  that  the  plant  equation  of  the  system  is 
not  noise  corrupted  as  it  is  purely  kinematic.  Therefore  qk  =  0. 

This  procedure,  however,  is  not  quite  correct.  We  have  neglected  the  fact  that  the  new 
estimate  Zk+ 1  will  not  be  valid  at  the  same  image  plane  location  ( x,y )  at  which  Z£  was 
stored  since  x  and  y  change  over  time.  To  make  things  worse,  the  position  at  which  our 
estimate  will  be  valid  may  not  coincide  with  a  grid  point  of  the  image  array.  So  we  must 
determine  where  the  point  at  which  we  are  estimating  the  depth  moves  to  and  interpolate 
the  depth  at  the  grid  points  from  this  transformed  depth  map. 


Figure  10:  Displacement  of  observed  points  between  successive  images 


The  movement  of  a  point  (x ,y)  in  the  image  plane  is  described  by  the  motion-field 
equation  (26).  A  finite  differences  discretization  yields 


x’  =  x  4-  T(  -  <r-^  —  -f  uxxy  -  uy(x2  +  1)  +  uzy) 


y'  =  y  +  T(  -  ty-~  ytz-  -  uyxy  +  vx{y2  -f  1)  - 
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which  provides  us  with  the  coordinates  (x',yr)  at  which  our  predicted  depth  is  valid. 
We  can  think  of  this  as  a  new  depth  map  which  evolves  from  the  old  one  by  the  prediction 
process  as  depicted  in  figure  10.  The  remaining  problem  is  to  interpolate  the  depth  and  the 
variance  at  the  grid  points  of  the  image  array.  We  investigate  two  methods  for  accomplishing 
this  in  which  we  focus  on  the  depth  interpolation. 

The  first  method  simply  fits  a  plane  to  the  data  in  a  least-squares  fashion.  Supposing 
the  depth  values  Z{  at  coordinates  (2:,, j/,)  for  i  —  1  are  within  some  neighborhood 

of  the  grid-point  we  are  considering.  We  locally  approximate  the  real-world  surface  by  a 
plane 

aX  +  bY  +  cZ  =  l  (107) 

and  using  the  perspective  projection  equations  we  have 


axZ  +  byZ  -f  cZ  —  1.  (108) 

We  seek  to  determine  the  parameters  a ,  b  and  c  such  that  the  deviation  of  our  depth  values 


from  the  resulting  plane 

n 

Zj  +  byiZi  +  cZi  —  l)2 

(109) 

is  minimized.  The  minimum 

can  be  found  by  differentiating  and  equating  to  zero.  This 

yields  the  following  3x3  system 
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which  can  be  solved  for  a,  b  and  c.  Then  the  depth  value  at  the  grid-point  x0>  J/o  is  easily 

computed  as 


Z0= 


axo  +  biyo  +  c' 

The  method  is  rather  intensive  in  terms  of  computational  cost.  We  must  compute  9 
different  sums  over  all  of  the  n  measurement  values  and  solve  a  3x3  linear  system  for  ever 
interpolation  point.  We  only  obtain  an  interpolation  for  a  planar  approximation. 

An  alternative  that  is  more  computationally  oriented  is  to  compute  the  grid-point  value 
as  some  weighted  average  of  the  estimates  in  its  neighborhood 


We  have  set  up  some  criteria  for  the  choice  of  the  weighting  function  tn,-  which  can  be  found 
in  appendix  B.  We  show  there  that 


fulfills  a  set  of  natural  requirements  that  one  would  demand  of  such  a  weighted  average. 
The  appendix  also  contains  a  linear-time  algorithm  for  computing  the  weights  and  the 
average  so  that  the  resulting  scheme  is  very  efficient. 

Other  interpolation  schemes  include  bi-linear  and  bi-cubic  interpolation.  See  Rifman 
and  McKinnon  [46],  Bernstein  [4]  or  Abdou  and  Wong  [1]  for  some  interesting  and  practi¬ 
cally  valuable  techniques. 

6.1.5  Discussion 

Our  scheme  for  integrated  motion  and  depth  estimation  is  now  complete.  The  charac¬ 
teristics  and  deficiencies  of  this  scheme  are  summarized  below. 

(1)  We  recover  a  dense  depth  map  using  a  Kalman  filter  and  a  motion  estimate  to  fit  the 
depth  map  in  a  least-squares  sense. 

(2)  The  validity  of  the  brightness  change  constraint  equation  is  assumed.  No  other  as¬ 
sumptions  about  surface,  reflectance  or  motion  are  made. 

(3)  We  need  no  optical  flow  or  displacement  estimator  -  the  algorithm  operates  directly 
on  brightness  values. 

(4)  Estimates  are  based  on  gradient  approximations  that  tend  to  have  little  numerical 
robustness. 

(5)  Depth  and  motion  estimates  rely  on  the  same  physical  relationship.  This  may  lead 
to  instability  of  the  iterative  estimation. 

Item  (4)  is  an  inherent  property  of  all  schemes  that  rely  on  the  differential  form  of  the 
brightness  change  constraint  equation.  An  alternative  is  to  consider  an  equivalent  inte¬ 
gral  constraint  to  obtain  higher  robustness.  The  6th  item  reflects  the  ”chicken-and-egg” 
problem  that  arises  when  we  sequence  motion  and  depth  estimation  rather  than  recover 
them  simultaneously.  Since  the  latter  cannot  be  achieved  in  general,  we  rely  on  the  filter 
to  provide  sufficient  convergence  to  render  the  effect  of  this  problem  negligible.  We  address 
these  concerns  by  slightly  altering  the  implementation  as  described  below. 

6.2  Using  SSD-based  displacement  estimates  as  measurements 

This  section  investigates  an  alternative  measurement  procedure  to  the  one  based  on 
the  differential  form  of  the  brightness  change  constraint  equation  described  in  the  previous 
section.  In  order  to  remedy  the  problems  with  the  differential  approach  (refer  to  items 
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5  and  6  in  the  discussion)  we  require  two  main  properties  of  an  alternative  formulation: 
measurements  should  be  based  on  information  from  a  larger  image  area  (an  integral  rather 
than  a  differential  approach)  and  the  measurement  process  should  be  independent  of  the 
motion  estimation. 


h! 


Figure  11:  Integrated  depth  and  motion  estimation  based  on  correlation-based  displacement 
measurements 


Figure  (11)  has  basically  the  same  structure  as  the  one  in  the  previous  subsection.  We 
have  replaced  the  depth  estimator  by  an  sum-of-squared-differcnces  (SSD)  displacement 
estimator.  The  output  of  this  system  is  a  vector  r*  =  (£fc,2/fc)  for  every  image  point  which 
describes  where  a  given  pixel  in  the  previous  image  has  moved  to  in  the  current  image. 
Given  these  measurements  we  are  forced  to  change  the  structure  of  our  dynamic  system 
to  produce  these  quantities  as  output.  We  resort  to  our  initial  formulation  of  the  object 
kinematics  and  have  chosen  the  discrete  formulation  (30)  for  the  sake  of  variety.  We  have 

Rk+i  =Pk  +  VkRk<ll  (114) 
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where  R  denotes  the  real  world  position  of  a  point  on  the  surface  being  viewed  and  p.  4 
represent  translation  and  rotation.  The  components  of  our  plant  are  therefore 


Xk+ 1  =  Px  +  422-Vfc  +  423%  +  424-^t'  (  1 15) 

%+ 1  =  Py  +  432  Xk  +  433  %  +  431^/t  (116) 

Z/fe+i  =  Pz  +  442^/fe  +  443%  +  444^*  (H7) 

and  the  output  is  simply 

z*  =  Xk/Zk  (118) 

Vk  =  Xk/Zk.  (119) 


The  qij  in  the  above  equations  are  the  elements  of  the  matrix  in  (29)  introduced  for  the 
quaternion  representation  of  rotations.  We  see  that  we  arc  trading  off  higher  accuracy  for 
complexity  of  the  system. 

6.2.1  SSD-based  displacement  estimation 

Displacement  estimation  is  related  to  optical  flow  estimation  in  that  displacements  arc 
the  product  of  the  instantaneous  velocity  and  the  inverse  frame  rate.  While  optical  flow 
estimation  has  proceeded  along  the  line  suggested  by  Horn  and  Schunck  (27],  displacement 
estimation  using  correlation- based  methods  or  sums-of-squared- differences  has  proven  to 
be  effective.  Comparisons  of  typical  correlation- based  matchers  can  be  found  in  Hannah 
[17]  and  Burt,  Yen  and  Xu  [9].  An  in-depth  study  of  the  applications  of  these  techniques 
to  the  estimation  of  displacement  fields  in  motion  sequences  is  presented  by  Anandan  [2]. 
In  particular,  correlation- based  estimation  has  proven  to  be  useful  when  large  interframe 
displacements  occur.  On  the  other  hand,  the  method  encounters  difficulties  with  foreshort¬ 
ening. 

We  employ  the  sum-of-squared-differences  (SSD)  technique  which  we  briefly  describe 
below  and  in  figure  (12).  Suppose  we  wish  to  determine  where  the  image  of  a  point  in  the 
real  world  that  was  located  at  coordinates  ( x,y )  in  frame  t  has  moved  to  in  frame  t  +  T. 
Let  the  new  coordinates  be  (x',y')  =  (x  +  A x,y  -f-  Ay).  We  assume  that  the  brightness 
in  a  neighborhood  of  the  point  of  interest  does  not  change  significantly  from  one  image  to 
the  next  (refer  to  the  brightness  change  constraint  assumption).  To  obtain  a  measure  of 
the  quality  of  a  given  displacement  estimate  (Ax,  Ay)  we  can  therefore  sum  the  squares  of 
the  differences  in  the  brightness  values  of  corresponding  points  in  the  neighborhoods  of  the 
original  and  displaced  image  point.  More  formally  if  £(x,  y,  t)  denotes  the  image  brightness 
at  location  (x,y)  in  frame  '  we  have 

0(Ax,A y)=  ^2  [E{x,y,t)  -  E(x  +  Ax,  y  +  Ay,f  +  T))2  (120) 

x.ytP 

as  a  measure  0  for  the  error  in  a  displacement  estimate  (Ax,  Ay).  P  is  used  as  a  symbol 
for  a  set  of  image  points  constituting  the  neighborhood  to  be  considered  in  the  correlation. 
The  displacement  estimator  minimizes  the  above  error  to  produce  (Ax*,  Ay*)  such  that 

0(Ax*,Ay*)=  min  0(Ax.Ay).  (121) 
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Figure  12:  SSD  displacement  estimation 


In  a  practical  implementation  such  as  the  one  by  Little  [33]  which  we  intend  to  use  in 
our  scheme,  0(Ax,  Ay)  is  computed  for  some  set  of  displacements  D  in  frame  t+T.  The 
resulting  set  of  error  values  is  searched  for  a  minimum  to  yield  (Ax*,  Ay*). 

In  order  to  utilize  such  an  estimate  in  a  Kalman  filter  we  need  a  quantification  of 
its  variance  or  more  precisely  the  covariance  matrix  for  [x',  y']7  which  is  the  same  as  for 
[Ax,  Ay]T: 


cou(Ax,  Ay) 


cov(  Ax,  Ay) 


We  may  assume  that  estimation  error  in  the  x-  and  y-directions  are  independent  so  that 
cov( Ax.  Ay)  =  0.  The  remaining  variances  are  due  to  the  variance  in  the  brightness 
measurement  as  described  in  the  previous  section.  However,  the  measurement  procedure 
itself  contributes  to  the  uncertainty. 

Anandan  [2]  suggested  a  “confidence  measure”  which  was  justified  intuitively  in  the 
following  way.  The  residual  error  0(Ax*,  Ay*)  is  one  component  of  the  confidence  measure: 
the  higher  it  is  the  lower  our  confidence  in  the  measure.  The  second  component  is  due  to 
the  fact  that  there  must  be  significant  variation  of  brightness  within  the  neighborhood  P 
in  order  for  t he  correlation  scheme  to  be  able  to  identify  the  neighborhood  in  the  next 
image.  In  other  words  points  in  areas  of  uniform  brightness  are  impossible  to  localize  in 
successive  images.  The  notion  of  variation  of  brightness  throughout,  the  patch  is  captured 
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by  the  curvature  of  the  SSD  function  i.e.  its  second  derivative.  The  confidence  measure 
was  therefore  essentially  the  second  derivative  of  0  divided  by  the  residual  error  in  the 
optimum  0(Ax*,A y*). 

Unfortunately,  there  is  no  formal  concept  of  a  confidence  measure  and  hence  no  deriva¬ 
tion  to  justify  it  exists  -  it  is  merely  intuitive.  It  is  also  intuitive  that  a  variance  should  be 
the  “inverse”  of  such  a  confidence  measure  although  the  connection  to  probabilistic  mea¬ 
sures  is  not  clear  either.  We  have  found  a  formal  derivation  a\x  and  a\y  which  verify 
Anandan’s  conjecture.  The  derivation  is  rather  lengthy  and  is  given  in  appendix  C.  The 
derivation  results  in  the  following  values  for  the  variances 
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Notice  that  the  denominators  contain  the  Hessian  of  0  which  indicates  the  curvature  while 
the  numerators  contain  the  residual  error  0.  These  are  the  components  of  the  confidence 
measure  suggested  h  Anandan.  The  above  variances  are  needed  in  the  Kalman  Filter 
update  process  described  below. 


6.2.2  Kalman  Filter  Update 

Before  the  actual  update  can  take  place  we  must  first  compute  the  filter  gain.  From 
(21)  we  have 

E*  =  -f  Rjt]-1 

where 


C*  = 
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is  the  Jacobian  of  the  output  equation  and 


R  fe  = 


Tix  0 
0  < 


(127) 


is  the  measurement  covariance  computed  in  the  previous  subsection.  We  see  the  increased 
computational  cost  here  as  we  must  invert  a  two-by-two  matrix  for  every  pixel  whereas  the 
previous  scheme  involved  only  a  scalar  inverse. 

Now  the  filter  update  proceeds  as  usual 


R  k  —  R/t  +  Ejt( 
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whore  x'  and  y'  arc  the  outputs  of  the  displacement  estimator.  Also  note  that  due  to  the 
fact  that  we  are  now  measuring  displacements  (i.e.  in  a  sense  the  first  two  components 
of  the  state  vector)  rather  than  the  depth,  we  must  perform  the  depth  map  interpolation 
after  the  update  instead  of  after  the  prediction.  The  predicted  output  is  compared  to  the 
displacement  estimate  in  the  update  equation  above.  Therefore  it  is  not  necessary  that  the 
depth  values  are  valid  at  grid-points  after  the  prediction  stage.  We  will,  however,  perform 
the  grid-point  interpolation  after  the  update  to  enable  motion  estimation  and  prediction  to 
operate  in  the  usual  fashion.  The  interpolation  proceeds  as  described  in  subsection  6.1.4. 


6,2.3  Motion  Estimation 


In  this  module  we  recover  the  motion  that  best  accomplishes  the  transformation  between 
frames  k  —  1  and  k  or  rather  between  our  estimates  of  the  state  variables  at  these  times.  If 
we  denote  the  image  region  under  consideration  by  P  then  we  seek  the  motion  parameters 
pr,py,pz  and  qij  for  i,  j  =  2,3,4  such  that 

)Cp[  (Xk  -  Px  +  722  Xk—  1  +  723%- 1  +  <l24%k- 1)2 
(%  -  py  +  732-%-  1  +  733%-l  +  734^-1  )2 
(Zk  ~  Pz  +  742%t-l  +  743%-l  +  744T*_i)2 

is  minimized. 

The  necessary  conditions  for  a  minimum  are  the  result  of  differentiating  the  above 
expression  with  respect  to  all  of  the  motion  parameters  and  equating  to  zero.  This  yields 
12  linear  equations  for  the  parameters.  But  since  the  above  error  sum  contains  three 
independent  terms,  we  really  have  3  four-by-four  systems  to  solve.  They  have  the  form 
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Yp  Ri,k 
Yp  Xk-iRi,k 

YpXk-iRi,k 

Yp  Zk-\Ri,k  J 

for  i  =  1,2,3.  We  have  used  the  notation  pt  to  denote  the  components  of  the  translation 
quaternion  and  Rl:k  to  denote  the  components  of  Rk,  i.e.  Xk,  Yk  and  Zk.  This  somewhat 
alleviates  the  computational  burden. 

To  complete  the  motion  estimate,  we  must  ensure  that  the  matrix 
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is  ortlionormal  or  in  other  words  that  the  rotation  quaternion  has  unit  magnitude.  We  seek 
the  nearest  orthonorrnal  matrix  to  Q  which  is 

Q  =  Q(QTQ)  —  Q(\/^7u!uJ  +  V/A2U2U2  +  N/AaUsuJ) 


(133) 


where  Aj  denotes  the  ith  eigenvalue  of  Q  and  u;  the  corresponding  unit  eigenvector.  This 
again  involves  quite  an  amount  of  computation. 

6.2.4  Kalman  Filter  Prediction 

We  note  that  the  Jacobian  of  our  dynamical  system  $*.  is  simply  the  matrix  from 
the  previous  subsection.  The  prediction  equations  become 

fyfc+1  =  P*:  +  Q 

=  q*p:q? 

where  we  have  once  again  assumed  that  the  plant  noise  is  zero. 

In  the  block-diagram  we  have  fed  the  state  estimate  back  into  the  displacement  esti¬ 
mator.  The  idea  is  to  use  this  estimate  to  limit  the  search  the  SSD-matcher  must  do.  In 
the  description  of  the  displacement  estimator  we  denoted  the  region  of  displacements  the 
estimator  would  consider  by  D.  This  region  will  now  be  a  neighborhood  of  the  predicted 
displacement  (X£ / ,Y^ / Z £)  which  limits  the  amount  of  search  considerably. 

6.2.5  Discussion 

This  scheme  is  well  suited  for  alleviating  the  problems  of  the  previous  one  namely  to 
decouple  motion  and  depth  estimation.  The  displacement  estimator  no  longer  requires  the 
motion  estimate  for  its  operation.  In  addition  our  measurement  process  is  no  longer  based 
on  gradient  approximations  but  rather  draws  information  from  a  region  of  the  image  plane. 
We  expect  this  to  result  in  greater  robustness  and  a  higher  rate  of  convergence. 

The  key  features  of  the  approach  presented  above  are  summarized  here: 

(1)  An  SSD-estimator  is  used  to  measure  displacements. 

(2)  A  discrete  object  motion  model  using  quaternions  is  employed. 

(3)  An  Extended  Kalman  Filter  is  used  to  estimate  real-world  object  coordinates. 

(4)  The  SSD-estimator  is  based  on  a  brightness  constancy  assumption.  No  additional 
assumptions  about  surface,  reflectance  or  motion  are  made. 

(5)  A  dense  depth  map  and  discrete  motion  parameters  are  recovered. 

(4)  The  scheme  is  computationally  quite  intensive.  Among  the  time  consuming  operations 
are:  searching  for  the  optimal  displacement,  matrix  inversions  for  the  gain  compu¬ 
tation,  solving  three  4x4  systems  for  motion  estimation,  renormalizing  the  rotation 
matrix. 

The  high  computational  cost  may  be  the  main  deficieny  of  this  scheme.  We  intend  to 
use  the  Connection  Machine  for  the  implementation  of  the  algorithm  in  order  to  achieve 
near- real- time  performance. 
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It  is  worth  pointing  out  some  essential  differences  between  the  algorithm  we  have  sug¬ 
gested  in  this  section  and  the  work  of  Matthies,  Szeliski  and  Kanade  [37].  In  both  cases  a 
Kalman  Filter  is  used  to  predict  and  update  a  dense  depth  and  variance  map.  However, 
the  latter  approach  is  restricted  to  translational  motion  parallel  to  the  image  plane.  In 
addition,  motion  must  be  known  at  every  point  in  time.  We  have  imposed  no  restriction 
on  motion  and  have  further  shown  how  motion  estimation  can  be  incorporated  into  the 
iterative  filtering  process.  This  makes  the  our  approach  applicable  to  a  far  more  general 
class  of  imaging  situations. 

7  Conclusion 

We  have  seen  three  conceptually  different  schemes  for  modelling  the  imaging  situation  as 
a  dynamical  system  and  using  a  state  space  observer/ filter  to  recover  parameters  of  interest 
such  as  depth  and  motion.  The  second  algorithm  presented  differs  structurally  from  the 
other  two  in  that  it  recovers  surface  parameters  instead  of  a  dense  depth  map.  Of  course  this 
requires  a  parametrization  of  the  surface  which  is  usually  unable  to  handle  discontinuities. 
But  the  parameter-based  approach  may  have  higher  robustness  and  be  well  suited  for 
specific  applications.  One  such  application  is  the  landing  approach  of  an  airplane  in  which 
we  may  approximate  the  runway  area  as  a  plane  and  seek  to  recover  the  relative  orientation 
of  aircraft  and  runway.  The  parameter-based  approach  can  naturally  be  extended  to  more 
complex  surface  structures  such  as  quadratic  patches  and  more  sophisticated  motion  models 
such  as  constant  acceleration.  Another  interesting  idea  is  to  approximate  locally  a  complex 
surface  by  planar  patches  and  apply  the  ideas  to  those  patches.  But  before  elaborating 
either  of  the  models,  they  should  prove  their  capabilities  in  an  implementation. 

The  first  algorithm  presented  in  this  paper  attempted  to  model  the  correct  physical 
relationships  involved  in  the  formation  of  brightness  values.  We  saw  that  this  forced  us 
to  make  very  strong  assumptions  about  the  reflectance  properties  of  the  surface  and  the 
illumination  conditions.  Most  real  images  will  not  satisfy  these  conditions  and  the  algorithm 
is  expected  to  operate  accurately  only  on  synthetic  data. 

The  two  variants  of  integrated  motion  and  depth-estimation  presented  last  are  most 
promising  for  application  to  real  images.  They  show  how  the  Kalman  filter  can  be  in¬ 
corporated  into  existing  motion  vision  schemes  to  achieve  incremental  depth  estimation 
over  more  than  2  frames.  We  also  noted  that  a  tradeoff  between  accuracy/robustness  and 
computation  expense  is  involved  of  which  the  presented  alternative  implementations  are 
good  examples.  The  Kalman  filter  proved  useful  for  the  depth  estimation  process  because 
this  quantity  could  be  interpreted  as  the  state  variable  of  a  dynamical  system.  Motion 
estimation,  however,  is  not  possible  without  some  dynamics  model  of  the  actuating  device. 
An  interesting  idea  would  be  to  incorporate  the  dynamical  model  of  a  mobile  robot,  which 
carries  the  camera  into  the  plant  equations,  and  apply  the  Kalman  filter  for  simultaneous 
depth  and  motion  estimation. 

The  depth  map  approaches  also  reveal  why  the  application  of  filtering  theory  has  pre¬ 
viously  focused  on  feature  matching:  displacements  of  features  are  known  rather  precisely 
and  provide  a  good  measurement  for  the  filter.  On  the  other  hand  they  have  the  deficiencies 
discussed  earlier  which  led  us  to  consider  non-feature  based  algorithms. 
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It  is  important  to  mention  that  the  dynamical  systems  approach  is  not  limited  to  the 
application  of  filters  for  depth  and  motion  estimation.  There  are  two  other  applications 
that  come  to  mind.  What  for  instance  would  it  mean  to  close  the  feedback  loop  over  one 
of  the  dynamical  systems  presented  above?  It  means  that  we  measure  some  indication  of 
motion  in  the  image  plane  and  feed  it  back  to  control  the  motion  of  the  camera.  This  could 
be  used  in  tracking  or  some  other  orientation  procedure  of  the  camera  carrier  such  as  in 
mobile  robots  or  aircrafts  and  satellites. 

Another  powerful  technique  that  one  could  import  from  control  theory  is  system  iden¬ 
tification.  Sophisticated  techniques  have  been  developed  to  estimate  the  parameters  of 
dynamical  systems  which  are  the  constants  in  the  plant  and  output  equations.  What  does 
it  mean  to  perform  system  identification  on  one  of  our  models?  The  model  in  section  4 
which  is  based  on  brightness  values  contains  the  diameter  of  the  camera  lens  and  all  models 
implicitly  contain  the  focal  length.  We  can  use  one  of  these  identification  procedures  to 
measure  these  quantities.  Identification  may  also  aid  in  dealing  with  more  complex  motion 
models  as  the  ones  presented  in  equation  (72).  In  this  case  we  would  have  to  perform  online 
identification  i.e.  identify  the  parameters  while  the  observer  estimation  is  being  performed. 
This  may  lead  to  interesting  and  powerful  models. 

The  main  objective  in  this  paper  was  to  introduce  a  systematic  way  of  dealing  with  a 
series  of  frames  in  motion  vision.  We  have  shown  that  dynamical  systems  provide  a  way 
for  dealing  explicitly  with  time  dependency  and  have  formulated  the  solution  to  a  common 
motion  vision  problem  in  terms  of  such  systems.  It  is  conceivable  that  is  approach  can  be 
extended  to  other  problems  in  which  information  is  acquired  over  a  series  of  frames  such 
as  multiframe  edge-detection,  segmentation,  color,  texture  etc.  Unlike  previous  approaches 
mentioned  in  the  introduction  we  have  attempted  to  found  our  models  rigorously  on  physical 
facts  only.  We  intend  to  carry  out  experiments  to  corroborate  our  theoretical  results. 
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Appendix 

A  System  Matrices  for  the  BCCE  Observer 


In  section  5.3  we  found  that  the  matrix 


and  the  vector 


A  5f 
A  k  =  rr- 

9 X  Xfc 


dx^k,yk 


are  necessary  to  compute  the  observer  gain  vector  e*.  The  vector  function  f(x^)  is  given 
by  (74),  the  implicit  output  g(xk,  by  (75).  In  this  appendix  we  give  the  elements  of 
the  above  matrices  for  convenient  computation.  We  omit  the  hat  representing  estimated 
values  and  the  index  k  of  time.  We  use  the  following  abbreviations: 

•  The  surface  normal  n  =  [nz,ny,nz]T 

•  The  translational  displacement  p  =  [px,py,Pz}T 

•  The  position  vector  of  an  image  plane  point  r  =  [x,y,  1]T 

•  The  brightness  gradients  Ex,  Ey  and  Et 

•  A  vector  x  has  a  corresponding  purely  imaginary  quaternon  x. 

•  The  rotation  9  is  a  unit  quaternion  so  =  1  -  g2  -  -  qz 

The  elements  of  A  are: 


02  ,  2  _  02  _  2  ,  1  +  nvPy  + 

<7o  +  <7x  9»  9*+  (i  +  n.p)2 


=  2{qxqv  -  q0qz)  - 
=  2(9xg2  +  9o9y) - 


(1  +  n  •  p)2 
nxPy 

(1  +  n  •  p)2 


(1  +  n  ■  p)2 
nxny 

(1  +  n-p)2 
nxtiz 

(1  +  n-  p)2 

2 ny(qy  +  Mi)  +  2 nz(qz  -  Mi) 
go  go 


-4nxqy  +  2  ny(qx  +  +  2  nz{q0 - -) 

go  go 

2 

-4raxgz  +  2ny(-q0  +  — )  +  2nz(qx  -  ^iii) 

go  go 


2(gxgv  +  gog*)  - 
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The  elements  of  c  are: 
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w  h  m 
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X=  —  wy=z—h 
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B  An  Efficient  Pixel  Interpolation  Scheme 

When  computing  Kalman  filter  updates  or  predictions  of  depth  and  variance  maps  the 
problem  arises  that  the  updated  value  is  no  longer  valid  at  a  grid-point  of  the  pixel-array 
because  the  projection  of  the  point  under  observation  has  moved.  The  proposed  solution 
in  sections  6.1.4  and  6.2.2  was  to  reinterpolate  the  grid-point  values. 

Here,  we  give  a  computationally  efficient  solution  for  the  following  problem:  Given  n 
points  (xi,  yt)  and  a  function  value  Z(xi,  yi )  at  those  points  interpolate  Z  at  a  point  (x,  y). 
We  will  compute  the  interpolated  function  value  as  a  weighted  sum 

n 

Z(z',!/')  =  Ewi2(ii,yi).  (141) 

«=i 

A  good  weighting  function  should  fulfill  the  following  requirements: 

•  0  <  W{  <  1 
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•  E"=1  tp,-  =  i 

•  The  u>i  should  decrease  as  the  distance  d%  =  (xi  —  x')2  4-  (y,  —  y')2  decreases. 
We  therefore  choose  the  weighting  factors  to  be 


S* 

This  clearly  fulfills  all  of  the  requirements  but  some  special  cases  must  be  considered. 
Suppose  all  n  estimates  involved  in  the  interpolation  have  equal  distance  dt  =  d  from  the 
grid-point.  In  this  case 

Wi  =  _k_  =  l  (143) 

v  i  71 

;=i 

which  means  that  all  estimates  are  weighted  equally  as  one  would  expect.  A  more  tricky 
case  occurs  when  some  number  k  of  the  estimates  are  actually  located  at  the  grid-point, 
i.e.  di  =  0  for  i  =  1, . . . ,  k.  For  estimates  on  the  grid-point  we  can  rearrange  the  expression 
for  the  weights  (142)  to  obtain 


as  di  — ►  0  for  i  =  1, . . . ,  k.  Similarly  we  rearrange  the  expression  for  the  weights  in  the  case 
of  an  estimate  that  does  not  coincide  with  the  grid-point 


as  di  — »  0  for  t  =  1  ,...,k.  In  other  words  we  ignore  all  estimates  that  are  not  on  the 
grid-point  and  obtain  the  interpolated  value  as  the  mean  of  the  estimates  located  at  the 
grid-point. 

This  may  be  very  easily  encoded  into  an  efficient  0(n)  algorithm  which  we  give  in  the 
following  pseudocode  notation: 

fc  *-  0;  W  <—  0; 

For  i  <— 

if  ({■  =  0  then  k  «—  k  +  1  else  W  <—  W  +  jj-; 


g 

8 

■»*i 


if  k  >  0  then 


■i 

I 


a 


For  i  *—  1, . . . ,  n 

if  d2  =  0  then  Wi  <—  %  else  in,-  <—  0; 


For  i  * —  1, ...  ,n 
Wi  -  jr/W- 

When  dealing  with  real  estimates  they  will  rarely  fall  clir^'c Ll>  onto  the  grid-point  and 
we  may  wish  to  replace  the  test  d2  =  0  in  the  above  algorithm  with  <  (  where  e  is  chosen 
according  to  the  numerical  accuracy  of  our  processor. 

C  Variance  of  the  SSD-Displacement  Estimator 

In  this  appendix  we  give  the  derivation  of  the  variances  <j\x  and  a\y  of  an  SSD  dis¬ 
placement  estimate.  Recall  that  in  section  6.2.1  we  have  introduced  the  SSD  functional 
120 

0(Ax,Ay)  =  Y2  [E(,x,y,  0  -  E(x  +  Ax,  y  +  Ay,  t  -f  T)]2  (146) 

x,yiP 

which  quantified  the  quality  of  a  match  between  point  (x,  y)  in  frame  t  and  point  ( x  + 
Aa ;,y  -I-  Ay)  in  frame  t  +  T.  The  optimal  displacement  (Ax*,  Ay*)  was  defined  by 

0(Ax*,Ay*)  =  min  0(Ax,Ay).  (147) 

Ax, A  yc.D 

In  the  following  we  abbreviate  the  second  partial  derivatives  of  0  with  respect  to  Ax 
and  Ay  by  0X  and  0y  and  the  second  partials  by  0XX,  0xy  and  0yy. 

The  displacement  estimate  is  the  result  of  minimizing  (146).  Differentiating  this  equa¬ 
tion  with  respect  to  Ax  and  Ay  yields  the  necessary  conditions  for  the  minimum: 


0x(Ax,  Ay)  =  -2  YZ  [£(z,y,0  -  E(x  +  Ax,y  +  Ay,t  +  T)] 


l(x+Ax,y+Ay,<+T) 


0y(Ax,A y)  =  -2  [E(x,y,t)  -  E(x  +  Ax,y  +  A y,t  +  T)]  —  (149) 

x,y tP  (x+Ax,y+A y,t+T) 

must  both  be  equal  to  0.  Assuming  that  the  variance  in  a  measurement  of  E  is  a\  and  in 
a  gradient  measurement  of  is  2a2Ejd2x  (by  applying  the  variance-propagation  (87)  to  the 
gradient  approximations  (90)  above)  we  can  determine  the  variance  in  0X  and  0y. 

We  first  introduce  two  new  sets  of  random  variables  Ax(x,y,t)  and  Ay(x,y,t)  where 


Ax(z,y,t)  =  [E(x,y,t)-E(x  +  Ax,y  +  Ay,t  +  T)]  —  (150) 

<-'x  (x+Ax,y+Ay,(+T) 

Ay{x,y,t)  =  [E{x,  y,  t)  -  E(x  +  Ax,  y  +  Ay,  t  +  T)}  (151) 

Vi!  (x+Ax,y+A y.t+T) 
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which  are  simply  the  terms  being  summed  in  (148)  and  (149).  Using  the  formula  for  the 
propagation  of  variance-  of  independent  random  variables  (87)  we  easily  see  that 
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dE 


dx 


(x+Ar.y+Ay.t+T) 
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^(E(x,y,t)  -  E( x  +  Ax,y  +  Ay,*  +  X1))2]. 

With  these  abbreviations  we  can  write 

0X  =  -2  Ax(x,y,t) 

x,ytP 

0y  =  —2  Ay(l,  y,  t). 

x,y  tP 

Again  we  make  use  of  the  variance  propagation  formula  (87)  to  obtain 
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■4  X]  (£(*»  0  -  E(x  +  Aar,  y  +  Ay,  *  +  T))2)  (157) 

®V  x,ytP 

where  we  recognize  the  second  sum  in  each  of  the  expressions  to  be  0. 

On  the  other  hand  we  can  use  the  variance' propagation  (87)  on  (146)  to  determine  the 
immediate  dependency 


4,  =  0L4r+02y4y 

4V  =  02y4*  +  0yy  4y 


(158) 

(159) 


It  remains  to  substitute  the  expressions  for  and  cr|  from  (157)  into  the  above  equations 
and  solve  for  the  desired  variances  a\x  and  CTAy  of  the  displacement  components. 
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We  can  obtain  an  expression  in  analogy  to  the  confidence  measure  suggested  by  Anandan 
by  observing  that  the  pixel  distances  dx  and  dy  are  small  quantities  so  that  the  parts  of 
the  numerators  which  have  a  coefficient  1  /d2  or  1  /d2  will  tend  to  dominate  the  remaining 
terms.  The  expressions  for  the  variance  are  then 
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in  which  we  recognize  the  denominators  to  contain  the  Hessian  of  0  which  indicates  the 
curvature.  All  expressions  containing  0  must  of  course  be  evaluated  for  the  estimated 
displacement  (Ax*,  Ay*).  In  practice  it  may  be  too  tedious  to  approximate  all  the  necessary 
derivatives  of  0  when  computing  the  variances  so  that  further  simplifications  are  necessary. 
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